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Which  statistical  tool  shoaLl  I  use? 

How  should  I  use  if? 

How  can  I  interpret  my  results? 

SUCH  QUESTIONS  plague  every  engineer, 
scientist,  research  worker,  student,  and 
teacher,  and  they  will  tmd  this  book  an 
indispensable  reference  which  contains 
the  answers  to  these  and  many  other 
questions. 

Modern  techniques  arc  presented  in 
ways  conducive  to  easy  leaning  and 
application.  The  most  difficult  part  of 
statistical  method  is  learning  when  and 
where  to  apply  a  particular  technique. 
Statistics  in  Research  carefully  indicates 
the  assumptions  underlying  ruii  tech 
nique  so  that  it  will  be  applied  properly 
within  its  limitations, 

Especially    noteworthy    features    in 
clude; 

*  emphasis  on  theory  for  a  thorough 
undorstaiuling  of  fundamentals 


*  a  comprehensive  coveuv^  of  re 
gression  analysis 

*  useful  check-lists  on  the  various 
aspects    of   experimental    design 

*  presentation    of    rigorously    de 
veloped  procedures  in  the  form 
best  suited  for  computation 

*  inclusion  of  a  number  of  non-para 
metric  techniques  in  recognition 
of  the  growing  importance  of  this 
area  in  research 

*  a  separate  chapter  on  quality  con.- 
trol 

*  an  excellent  collection  of  impor 
tant  and  useful  tables  with  stand 
ardized  formats 

*  statistical  inference  (both  estima 
tion  and  testing  hypotheses)  well 
organised,  thoroughly  discussed 
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THE  PAST  FEW  YEARS  have  seen  many  changes  in  the  science  of  statis 
tics.  New  techniques  of  analysis  and  inference  have  been  developed 
by  mathematical  statisticians  while  applied  statisticians  have  been 
busily  engaged  in  applying  both  old  and  new  techniques  in  novel 
situations.  These  facts,  plus  the  tendency  toward  an  increased  use  of 
mathematics  in  research,  indicated  that  a  revision  of  this  text  was 
needed. 

This  revised  edition  has  been  prepared  with  the  following  goals  in 
view:  (1)  to  provide  a  book  giving  those  statistical  methods  that  have 
been  found  useful  by  workers  in  many  areas  of  scientific  research,  (2) 
to  present  these  methods  as  integral  parts  of  a  complete  discipline,  and 
(3)  to  provide  a  textbook  that  will  facilitate  teaching  the  science  of 
statistics. 

In  attempting  to  achieve  the  above  goals,  I  have  taken  the  position 
that  it  is  possible  to  write  a  book  which  will  prove  acceptable  to  stu 
dents,  teachers,  and  research  workers  in  many  fields  of  specialization. 
Consequently,  this  book  presents  the  techniques  of  modern  statistics  as 
statistical  methods  per  se*  To  demonstrate  the  universality  of  statistical 
methods,  many  examples  from  varied  fields  of  application  ai*e  included. 

In  this  book  considerable  attention  has  been  given  to  the  assumptions 
underlying  the  techniques  presented.  Withoxit  a  thorough  understand 
ing  of  the  limitations  of  various  techniques,  one  might  apply  them  in 
situations  where  they  should  not  be  used.  The  learning  of  methods  is 
easy;  learning  when  and  where  to  use  them  is  not  so  easy.  I  have  at 
tempted  to  achieve  a  reasonable  balance  between  these  two  ends. 

This  edition  has  been  designed  in  such  a  manner  that  it  should  prove 
xiseful  for  several  purposes,  namely:  (1)  as  a  text  for  a  standard  course 
in  statistical  methods,  (2)  as  a  text  for  an  integrated  course  in  theory 
and  methods  for  students  in  engineering  and  the  physical  sciences,  and 
(3)  as  a  reference  book  for  research  workers  and  other  users  of  statistical 
methods,  whether  they  be  affiliated  with  government,  industry,  re 
search  institutes,  or  universities. 

Because  of  the  multiple-purpose  design  of  the  book,  some  topics  will 
be  of  interest  only  to  special  groups.  In  addition,  changes  in  order  of 
presentation  by  individual  teachers  will  also  occur.  However,  it  is  my 
belief  that  a  reasonable  compromise  among  descriptive  statistics, 
mathematical  statistics,  statistical  methods,  and  the  design  and  analy 
sis  of  experiments  has  been  achieved,  and  that  the  book  will  prove 
suitable  for  all  the  purposes  for  which  it  was  planned. 

I  am  indebted  to  Sir  Ronald  A.  Fisher,  to  Dr.  Frank  Yates,  and 
to  Oliver  and  Boyd  Ltd.,  Edinburgh,  for  permission,  to  reprint  Table  III 
from  Statistical  Tables  for  Biological,  Agricultural  and  Medical  Research. 
I  am  also  indebted  to  Dr.  O.  L,  Davies  and  to  Oliver  and  Boyd  Ltd., 
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7.7,  7.72,  E,  E.I,  G,  and  H  from  the  second  edition  of  The  Design  and 
Analysis  of  Industrial  Experiments.  Many  other  persons  have  also 
graciously  given  permission  for  the  reproduction  of  published  material, 
and  acknowledgment  has  been  made  at  the  appropriate  places  in  the 
text.  The  author  is  deeply  appreciative  and  wishes  to  express  his 
thanks  for  their  cooperation. 
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CH APTE  R  1 

THE  ROLE  OF  STATISTICS 
IN  RESEARCH 

EVERY  DAY  each  of  us  engages  in  some  observation  in  which  statistics 
is  used.  Such  common,  events  as  noting  the  weather  forecast,  weighing 
oneself,  checking  the  position  of  a  favorite  ball  team  in  its  league,  or 
testing  a  new  food  product  are  typical.  The  element  of  statistics  creeps 
in  when  you  mentally  evaluate  your  research.  In  weighing  yourself,  you 
automatically  compare  your  observation  with  your  average  weight 
(deviation  from  the  mean)  and  conclude  the  present  weight  is  usual 
(no  significance  to  the  difference)  or  unusual  (a  significant  difference) , 
basing  your  judgment  upon  previous  measurements  of  your  weight  and 
your  knowledge  of  the  variation  generally  observed.  These  common 
results  are  easily  obtained,  are  of  only  local  importance,  and  are  soon 
forgotten.  However,  the  formal  research  which  means  so  much  to 
improving  man's  lot  is  of  infinitely  greater  importance  and  must  be 
conducted  with  much  greater  care.  It  is  with  the  latter  type  of  research 
that  this  book  is  concerned. 

1.1      THE  NATURE  AND   PURPOSE  OF  RESEARCH 

Research,  according  to  Webster,  is  studious  inquiry  or  examination — 
critical  and  exhaustive  investigation  or  experimentation  having  for  its 
aim  the  discovery  of  new  facts  and  their  correct  interpretation.  It  also 
aims  at  revising  accepted  conclusions,  theories,  or  laws  in  the  light  of 
newly  discovered  facts  or  the  practical  applications  of  such  new  or 
revised  conclusions.  Research,  therefore,  means  continued  search  for 
knowledge  and  understanding;  scientific  research  is  continued  research 
using  scientific  methods.  Scientific  research  is  essentially  compounded 
of  two  elements:  observation,  by  which  knowledge  of  certain  facts  is 
obtained  through  sense-perception;  and  reasoning,  by  which  the  mean 
ing  of  these  facts,  their  interrelation,  and  their  relation  to  the  existing 
body  of  knowledge  are  ascertained  insofar  as  the  present  state  of 
knowledge  and  the  investigator's  ability  permit. 

In  any  discussion  of  research,  two  important  facts  should  be  noted. 
They  are :  (1)  there  is  an  ever  increasing  trend  towards  extreme  speciali 
zation  on  the  part  of  individual  scientists,  and  (2)  most  research  prob 
lems  are  such  that  many  disciplines  and  fields  of  specialization  can 
contribute  in  a  significant  manner  to  their  solutions.  Thus,  it  is  evident 
that  more  and  more  research  will  be  handled  on  an  interdisciplinary 
team  basis  rather  than  by  individual  scientists  working  in  "solitary 
confinement/'  (NOTE:  This  is  not  to  say  that  very  little  individual 

til 
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research  will  continue  to  be  performed.  Sucli  research  always  should 
and  always  will  be  performed.  The  statement  was  meant  to  imply 
only  that  research  is  becoming  predominantly  a  team  or  cooperative 
effort.) 

In  summary,  research  is  an  inquiry  into  the  nature  of,  the  reasons  for, 
and  the  consequences  of  any  particular  set  of  circumstances- — whether 
these  circumstances  are  experimentally  cozitrolled  or  recorded  just  as 
they  occur.  Further,  research  implies  the  researcher  is  interested  in 
more  than  particular  results — he  is  interested  in  the  repeatability  of 
the  results  and  in  their  extension  to  more  complicated  and  general 
situations. 

1.2  RESEARCH   AND  SCIENTIFIC   METHOD 

Although  the  techniques  of  investigation  may  vary  considerably 
from  one  science  to  another,  the  philosophy  common  to  all  is  generally 
referred  to  as  scientific  method.  There  are,  perhaps,  as  many  definitions 
of  scientific  method  as  there  are  workers  in  research.  For  our  purposes, 
the  following  will  be  used:  Since  the  ideal  of  science  is  to  achieve  a  systcm- 
atic  interrelation  of  facts }  scientific  method  must  be  a  pursuit  of  tkzs  ideal 
by  experimentation,  observation,  logical  arguments  from  accepted  postu 
lates  r  and  a  combination  of  these  three  in  varying  proportions.  Therefore, 
research  and  scientific  method  are  closely  related,  if  not  one  and  the 
same  thing. 

1.3  WHAT   IS  STATISTICS? 

Statistics  has  often  been  classified  as  a  method  of  research  along 
with,  or  in  opposition  to,  .such  methods  as  case  studios,  the  historical 
approach,  and  the  experimental  method.  Since  this  classification  fre 
quently  leads  to  confused  and  incorrect  thinking,  it  is  not  wise.  It  is 
better  to  regard  statistics  as  supplying  a  kit  of  tools  which  can  be 
extremely  valuable  in  research.  This  book  will  stress  gaining  an  tinder- 
standing  of  these  tools  and  learning  which  tool  should  be  xised  in  vari 
ous  situations  arising  in  scientific  research.  Only  when  you  know  which 
tool  to  xiso,  how  to  use  it,  and  how  to  interpret  your  results  can  you 
hope  to  do  productive  research.  To  summarise:  the  science  of  statistics 
has  mxieh  to  offer  the  research  worker  in  planning,  analyzing,  and  inter 
preting  the  results  of  his  investigations,  and  tliis  book  is  devoted  to 
an  exposition  of  those  methods  and  techniques  that  have  proved  useful 
in  many  fields  of  inquiry, 

As  is  the  case  with  many  words  in  the  English  language,  the  word 
statistics  is  used  in  a  variety  of  ways,  each  correct  in  its  own  sphere. 
In  the  plural  sense,  it  is  usually  taken  to  be  synonymous  with  data. 
However,  to  the  statistician,  there  is  another  meaning  of  the*  word. 
This  moaning  is  the  plural  of  the  word  statistic,  which  refers  to  a  quan 
tity  calculated  from  sample  observations*  (These  terms  will  be  defined 
in  considerable  detail  in  later  chapters.)  In  the  singular  sense,  statistics 
is  a  science,  and  it  is  in  this  sense  the  word  will  be  employed  most  fro- 


quently  in  this  book.  The  science  of  statistics  deals  with: 

(1)  Collecting  and  summarizing  data. 

(2)  Designing  experiments  and  surveys. 

(3)  Measuring  the  magnitude  of  variation  in  both  experimental 
and  survey  data. 

(4)  Estimating    population    parameters    and    providing    various 
measures  of  the  accuracy  and  precision  of  these  estimates. 

(5)  Testing  hypotheses  about  populations. 

(6)  Studying  relationships  among  two  or  more  variables. 

1.4  STATISTICS  AND   RESEARCH 

As  indicated  in  the  preceding  section,  statistics  enters  into  research 
and/or  scientific  method  through  experimentation  and  observation. 
That  is,  experimental  and  survey  investigations  are  integral  parts  of 
scientific  method,  and  these  procedures  invariably  lead  to  the  use  of 
statistical  techniques.  Since  statistics,  when  properly  used,  makes  for 
more  efficient  research,  it  is  recommended  that  all  researchers  become 
familiar  with  the  basic  concepts  and  techniques  of  this  useful  science. 

Because  statistics  is  such  a  valuable  tool  for  the  researcher,  it  some 
times  gets  overworked.  That  is,  there  are  many  cases  where  statistics 
is  used  as  a  crutch  for  poorly  conceived  and/or  executed  research.  In 
addition,  there  are  cases  in  which  statistics  is  employed  in  good  faith 
but,  unfortunately,  insufficient  attention  is  paid  to  the  assumptions 
required  for  a  valid  use  of  the  methods  employed.  For  these  and  other 
reasons,  it  is  essential  that  the  user  of  statistics  clearly  understands  the 
techniques  he  employs.  Consequently,  in  this  book  careful  attention 
will  be  given  to  both  the  methods  and  the  underlying  assumptions  in 
the  hope  that  such  an  approach  will  lead  to  the  proper  application  and 
use  of  statistics  in  scientific  research. 

1.5  FURTHER      REMARKS     ON      SCIENCE,      SCIENTIFIC 
METHOD,  AND  STATISTICS 

In  the  preceding  sections,  your  attention  has  been  called  to  the  close 
connection  that  statistics  has  with  experimentation,  scientific  method, 
and  research.  However,  in  each  case,  the  discussion  was  quite  brief. 
Because  these  various  topics  and  their  interrelationships  are  so  impor 
tant  to  the  remainder  of  this  book,  a  few  additional  remarks  are  justi 
fied.  To  expedite  the  discussion,  the  following  questions  and  answers 
have  deon  devised: 

What  Is  Logic? 

Logic  deals  with  the  relation  of  implication  among  propositions,  that 
is,  the  relation  betweeii  premises  and  conclusions.  In  scientific  method, 
logic  aids  in  formulating  our  propositions  explicitly  and  accurately  so 
that  their  possible  alternatives  become  clear.  When  faced  with  alterna 
tive  hypotheses,  logic  develops  their  consequences  so  that  when  these 
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consequences  are  compared  with  observable  phenomena  we  have  a 
means  of  testing  which  hypotheses  are  to  be  eliminated  and  which 
one  is  most  in  harmony  with  the  observed  facts. 

What  Is  Science? 

Science  is  knowledge  which  is  general  and  systematic — knowledge 
from  which  specific  propositions  are  deduced  in  accordance  with  a  few 
general  principles.  Although  all  the  sciences  differ,  a  universal  feature 
is  "scientific  method/'  which  consists  of  searching  for  general  laws 
which  govern  behavior  and  of  asking  such  questions  as:  Is  it  so?  To 
what  extent  is  it  so?  Why  is  it  so?  What  general  conditions  or  consider 
ations  determine  it  to  be  so? 

What  Is  Scientific  Method? 

Scientific  method  is  the  pursuit  of  truth  as  determined  by  logical 
considerations.  The  ideal  of  science  is  to  achieve  a  systematic  interrela 
tion  of  facts;  scientific  method,  using  the  approach  of  "systematic 
doubt/'  attempts  to  discover  what  the  facts  really  are. 

What  Is  Experimentation? 

The  function  of  experimentation  is  the  elimination  of  untenable 
theories*  Experimentation  is  used  to  test  hypotheses  and  to  discover 
now  relationships  among  variables.  It  must  be  remembered,  however, 
that  no  hypothesis  which  states  a  general  proposition  can  be  demon 
strated  to  be  absolutely  true;  only  probable  inferences  are  possible. 

What  Part  Does  Experimentation  Play  in  Scientific  Method? 

Experimentation  is  only  a  means  toward  an  end.  It  is  a  tool  of  scien 
tific  method.  Conclusions  drawn  from  experimental  data  are  frequently 
criticized.  Such  criticisms  are  usually  based  on  one  or  more  of  the 
following  arguments:  (1)  the  interpretation  ia  faulty,  (2)  the  original 
aasumptiorm  are  faxilty,  or  (3)  the  experiment  waa  poorly  designed  or 
badly  executed.  Obviously,  careful  attention  should  be  given  to  the 
design  of  the  experiment  HO  that  the  procedures  used  are  both  valid  and 
efficient. 

What  Is  Experimental  Design? 

Experimental  design  is  the  plan  u^ed  in  experimentation.  It  involves 
the  assignment  of  treatments  to  the  experimental  units  and  a  thorough 
underst a nding  of  the  analysis  to  be  performed  when  the  data  become 
available,, 

What  Is  the  Relationship  Between  Statistics  and  Experi 
mental  Design? 

vStatintics  enters  into  experimental  design  because,  even  in  the  best 
planned  experiments,  one  cannot  control  all  the  factors  and  because 
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one  wishes  to  make  inferences  based  on  the  observed  sample  data.  To 
be  of  any  practical  use,  these  uncertain  inferences  must  be  accompanied 
by  probability  statements  expressing  the  degree  of  confidence  which 
the  researcher  has  in  such  inferences.  To  make  certain  that  such  prob 
ability  statements  will  be  possible,  the  experiments  should  be  designed 
in  accordance  with  the  principles  of  the  science  of  statistics. 

1.6      APPLICATIONS  OF  STATISTICS    IN    RESEARCH 

Early  applications  of  statistics  were  mainly  concerned  with  reduc 
tion  of  large  amounts  of  observed  data  to  the  point  where  general 
trends  (if  they  existed)  became  apparent.  At  the  same  time,  emphasis 
in  many  sciences  turned  from  the  study  of  individuals  to  the  study  of 
the  behavior  of  aggregates  of  individuals.  Statistical  methods  were  ad 
mirably  suited  to  such  studies,  aggregate  data  fitting  consistently  with 
the  concept  of  a  population. 

The  next  major  development  in  statistics  arose  to  meet  the  need  for 
improved  analytical  tools  in  the  agricultural  and  biological  sciences. 
Better  analytical  tools  were  needed  to  improve  the  process  of  interpre 
tation  of,  and  generalization  from,  sample  data.  For  example,  the 
farmer  is  faced  with  the  task  of  maintaining  a  high  level  of  produc 
tivity  of  field  crops.  To  aid  him,  the  agronomist  conducts  an  endless 
number  of  experiments  to  determine  differences  among  yields  of  various 
crop  varieties,  effects  of  various  fertilizers,  and  the  best  methods  of 
cultivation.  On  the  basis  of  the  results  of  his  experiments,  he  is  ex 
pected  to  make  accurate  and  useful  recommendations  to  the  farm 
operator.  Clearly  then,  statistics,  being  a  science  of  inductive  inference 
using  probabilistic  methods,  should  be  of  great  value  to  the  researcher 
in  agronomy. 

In  early  agronomic  experimentation,  in  order  to  compare  a  number 
of  fertilizers,  it  was  thought  necessary  to  devote  only  a  single  plot  to 
each  treatment  and  determine  yields  in  order  to  arrive  at  valid  con 
clusions  concerning  relative  values  of  the  treatments.  However,  the 
agronomists  soon  found  that  the  yields  of  a  series  of  plots  treated  alike 
differed  greatly  among  themselves,  even  when  soil  conditions  appeared 
uniform  and  experimental  conditions  were  carefully  designed  to  reduce 
errors  in  harvesting.  For  this  reason,  it  became  necessary  to  find  some 
mearivS  for  determining  whether  differences  in  yields  were  due  to  dif 
ferences  in  treatments  or  to  uncontrollable  factors  which  also  con- 
tribxite  to  the  variability  of  plot  yields.  Statistical  methods  were  ap 
plied,  and  their  value  in  scientific  investigation  of  agronomic  practices 
was  soon  proved. 

Closely  related  to  agronomy  is  the  science  of  plant  breeding.  The 
ultimate  objective  of  any  plant-breeding  research  program  is  the  de 
velopment  of  improved  varieties  or  hybrids.  A  variety  may  be  im 
proved  in  many  possible  ways,  e.g.,  in  ability  to  use  plant  nutrients, 
in  disease  or  insect  resistance,  in  cold  tolerance,  or  in  its  suitability  to 
the  needs  or  fancies  of  the  grower  and/or  consumer.  Plants  are  organ- 
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isms  conditioned  by  genetic  factors  and  by  the  environment  in  which 
they  grow.  The  plant  breeder,  therefore,  utilizes  the  principles  of  genet 
ics  in  attempting  to  improve  inheritable  characteristics  of  plant  va 
rieties,  just  as  the  producer  attempts  to  obtain  high  production  by 
maintaining  a  favorable  environment.  However,  results  of  past  genetic 
studies  do  not  provide  all  the  answers  relative  to  the  inheritance  of 
plant  characteristics.  Thus,  plant  breeders  continually  carry  out  basic 
genetic  research  in  each  crop— along  with  practical  plant-breeding  pro 
cedures — in  order  to  ensure  future  progress. 

Development  of  a  superior  new  variety  by  hybridization  is  seldom 
a  haphazard  occurrence.  Usually  the  breeder  has  in  mind  the  charac 
teristics  desired  for  his  particular  purpose  or  area.  Growing  many  plant 
selections  to  decide  which  excel  in  a  quantitatively  inherited  character 
requires  growing  thorn  in  a  randomised,  replicated  field  design.  Choice 
of  design  depends  on  the  numbers  involved;  the  uniformity  of  the  soil; 
the  accxiracy  and  precision  of  the  particular  estimates  deemed  neces 
sary  to  get  the  desired  results;  the  time,  effort,  and  money  available; 
and  perhaps  other  factors.  The  data  collected  are  then  analyzed  in 
accordance  with  the  plan  of  the  experiment,  which  was  designed  to 
make  possible  proper  comparisons  among  the  strains  being  tested.  The 
statistical  methods  employed  must,  of  course,  have  a  logical  relation 
ship  to  the  biological  processes  under  consideration,  as  well  as  to  the 
way  in  which  the  experiment  was  conducted,  if  they  are  to  be  useful. 
After  the  data  have  been  analyzed  statistically,  the  results  must  be 
interpreted  in  view  of  the  assumptions  made  and  of  the  existing 
knowledge  so  that  some  conclusion  may  be  reached  with  regard  to 
accepting  or  rejecting  the  hypotheses  being  tested.  Selection  of  the 
strain  to  be  released  as  a  variety,  or  of  those  to  be  tested  further, 
may  then  bo  made  with  assxirance  that  the  decision  will,  in  all  likeli 
hood,  bo  a  reasonable  one. 

Other  research  areas  in  which  good  use  of  statistical  theories  and 
methods  is  made  are  poultry  breeding,  animal  breeding,  and  animal 
nutrition*  Poultry  brooding,  for  example,  is  concerned  with  the  raising 
of  more  efficient  and  more  productive  fowl.  Increased  egg  production, 
egg  sixe,  egg  color,  interior  egg  quality,  more  efficient  meat  production, 
long  life,  disease  resistance,  and  high  fertility  are  some  of  the  factors 
with  which  the  poultry  breeder  IH  actively  concerned.  If  a  statistically 
sound  research  program  is  adopted,  the  researcher  will  be  able  to  reach 
defensible  conclusions  and  bring  about  more  efficient  use  of  resources. 

One  of  the  more  important  uses  of  statistics  in  breeding  work  is  the 
separation  of  environmental  and  hereditary  effects*  The  literature  of 
the  field  i#  full  of  reports  dealing  with  this  type  of  research,  both  with 
poultry  and  domestic  animals.  For  the  reader  interested  in  this  par 
ticular  urea  of  research,  we  refer  to  such  writers  as  Ilutt  (25)  and 
Lush  (29),1 

1  Numborn  in  pnr<*nfrhonoH  designate  rofarfmoow  linted  at  mid  of  chapter. 
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In  the  field  of  animal  nutrition,  many  experiments  have  been  devised 
to  discover  the  significance  of  various  vitamins  in  the  different  phases 
of  animal  production.  In  such  investigations,  several  groups  of  animals, 
as  homogeneous  as  possible,  are  selected  for  experimentation.  These 
homogeneous  groups  are  usually  formed  by  considering  such  criteria  as 
age,  weight,  sex?  heredity,  vigor,  and  previous  nutrition.  A  check  group 
is  chosen  and  fed  a  standard  ration.  The  other  groups  are  fed  different 
levels  of  the  vitamin  in  question,  one  of  them  on  a  ration  a  great  deal 
higher  than  the  standard  ration  for  the  vitamin  and  another  on  a  ration 
containing  little,  or  none,  of  the  vitamin.  The  remainder  of  the  groups 
are  fed  rations  somewhere  between  the  extremes.  The  animals  are  on 
the  randomly  assigned  rations  for  a  given  period  of  time,  and  the  re 
searcher  records  such  data  as  daily  gain  in  weight,  economy  of  gain, 
livability,  etc.  If  the  experiment  has  been  properly  designed  in  accord 
ance  with  established  statistical  principles,  conclusions  of  great  value 
to  the  farmer  may  then  be  drawn.  Of  course.,  much  work  of  a  more 
complex  nature  than  this  simple  example  has  also  been  done  in  animal 
nutrition  research.  Consultation  of  technical  journals  in  this  field  will 
reveal  many  instances  where  statistics  has  been  of  great  help. 

In  the  past,  many  persons  thought  statistics  had  no  place  in  the 
so-called  "exact  sciences'7  such  as  chemistry,  physics,  and  the  various 
branches  of  engineering.  These  fields  are  concerned  with  exact  measure 
ment,  with  quantities  that  can.  be  measured  with  a  ruler,  thermometer, 
flow  meter,  thickness  gauge,  telescope,  or  pressure  gauge.  Therefore, 
the  doubters  asked,  why  use  a  "pseudo-science'' — statistics — that  at 
best  merely  estimates  quantities?  As  the  true  meaning  of  statistics  and 
its  application  has  come  to  wider  attention,  these  persons  have  readily 
admitted  there  is  indeed  a  place  for  this  important  tool  in  the  exact 
sciences.  In  fact,  it  has  become  apparent  that  ail  of  these  sciences 
themselves  are  based  on  statistical  concepts.  For  example,  it  is  evident 
that  the  pressure  exerted  by  a  gas  is  actually  an  average  pressure — an 
average  effect  of  forces  exerted  by  individual  molecules  as  they  strike 
the  wall  of  a  container.  A  similar  situation  is  true  in  regard  to  tem 
perature. 

Since  the  popularly  accepted  theory  is  that  all  matter  is  made  up  of 
small  particles,  it  does  not  require  much  imagination  to  see  that  a 
statistical  approach  is  the  logical  one  to  adopt  in  investigations  of  the 
ultimate  nature  of  matter.  Such  particles  are  actually  part  of  an  almost 
inconceivably  large  population — one  that  is,  for  all  practical  pur 
poses,  our  closest  approach  to  the  infinite  population.  All  of  these 
particles  exhibit  individual  behavior  characteristics.  With  the  com 
paratively  crude  devices  of  the  exact  sciences  we  can  generally  only 
note  the  results  of  group  behavior — an  average  effect — and  until  re 
cently  science  has  been  limited  to  this.  But  even  in  these  crude  applica 
tions  statistics  plays  its  role.  For  instance,  examine  the  chart  of  the 
elements  in  any  chemistry  classroom.  The  atomic  weights  shown  on 
this  chart  are  actually  "weighted  averages'7  of  the  atomic  weights  of 
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individual  isotopes  of  the  given  element,  the  "weights"  being  the  fre 
quency  of  occurrence  of  the  element  in  a  normal  or  naturally  occurring 
mixture. 

Statistics  has  also  invaded  the  fields  of  meteorology  and  astronomy. 
The  modern  science  of  meteorology  is  to  a  great  degree  dependent  upon 
statistical  methods  for  its  existence.  The  methods  which  give  weather 
forecasting  the  accuracy  it  has  today  have  been  developed  using  mod 
ern  sample  survey  techniques.  Thus,  weather  stations  throughout  the 
United  States  are  able  to  give  us  highly  accurate  predictions  for  their 
individual  areas.  In  addition,  by  suitable  selection  of  gathering  points 
and  proper  treatment  of  the  data,  an  over-all  picture  of  the  weather 
for  larger  areas  is  pieced  together.  Again  we  may  see  statistical  sampling 
in  action  when  we  turn  our  attention  to  snow  survey  teams  which  de 
termine  the  amount  of  snow  present  in  a  given  area  and  thus  the 
quantity  of  water  to  be  drained  from  that  area  following  a  thaw.  In  the 
more  theoretical  aspects  of  meteorology,  statistical  inference  and 
analysis  are  being  xised  to  develop  new  techniques  for  advancing  the 
field.  In  astronomy,  statistics  havS  long  played  a  major  role.  One  hundred 
years  ago  the  uncertainty  in  the  measurement  of  the  semimajor  axis 
of  the  earth's  elliptical  orbit  was  1  part  in  20.  Today  statistical  methods 
have  reduced  this  uncertainty  to  1  part  in  10,000. 

Statistics  is  now  playing  an  important  role  in  engineering.  For 
example,  such  topic**  as  the  study  of  heat  transfer  through  insulating 
materials  per  unit  time,  performance  guarantee  testing  programs,  pro 
duction  control,  inventory  control,  standardisation  of  fits  and  toler 
ances  of  machine  parts,  job  analyses  of  technical  personnel,  studies  in 
volving  the  fatigue  of  metals  (endurance  properties),  corrosion  studies, 
time  and  motion  studies,  operations  research  and  analysis,  quality 
control,  reliability  analyses,  and  many  other  specialized  problem?*  in  re 
search  and  development  make  great  use  of  probabilistic*,  and  Btatintical 
methods* 

Because  the  above  problems  are  but  a  small  portion  of  those  to 
which  the  science  of  statistics  IB  being  applied  in  industry,  the  reader 
can  readily  appreciate  that  the  application  of  statistical  methods  to  the 
field  of  engineering  is  riot  limited  to  a  few  areas  but  is  general  in  nature. 
As  an  indication  of  the  wide  scope  of  industrial  statistics,  P.  L.  Algor 
of  the  General  Electric  Corporation  has  listed  the  following  ten  major 
areas  of  application: 

1*  Defining  the  value  of  observations 
2*  Design  of  experiments 

3.  Detection  of  causes 

4.  Production  quality  control 

5*  (letting  more  out  of  the  inspection  dollar 

6.  Design  specifications 

7.  Measurement  of  human  attributes 
<S,  Operational  research 

9.   Market  research,  including  opinion  polling 
10.    Determining  trends*3 


1.6       APPLICATIONS    OF    STATISTICS    IN    RESEARCH  9 

If  applied  statistics  is  to  play  a  primary  role  in  the  future  of  engi 
neering,  or,  to  be  more  general,  in  that  of  industry,  it  is  quite  evident 
that  there  is  a  great  need  for  specific  training  of  personnel  entering  the 
field.  This  training  is  needed  for  the  young  engineer  as  well  as  for  the 
young  businessman,  since  each  must  be  capable  of  dealing  with  combi 
nations  of  men  and  machines.  Professor  S.  S.  Wilks  of  Princeton  Uni 
versity  has  made  this  statement  of  the  problem : 

The  statistical  problems  which  the  future  scientist  or  engineer  will  en 
counter  will  cut  across  traditional  lines.  Therefore,  in  order  that  he  may  be 
properly  equipped  to  deal  with  these  problems,  he  should  have  a  fairly 
broad  statistical  training.  The  training  should  cover  not  only  statistical 
quality  control  methods  as  the  term  is  now  understood,  but  the  design  of 
experiments,  analysis  of  variance,  and  many  other  topics.  It  should  be  built 
into  the  training  of  scientists  and  engineers,  as  calculus  is  now  made  part 
of  their  basic  education.3 

Agricultural  engineering,  which  combines  the  practices  of  engineer 
ing  and  agriculture,  has  also  benefited  greatly  from  the  use  of  statisti 
cal  methods.  In  this  field,  statistics  has  helped  the  researcher  with  such 
varied  projects  as  the  testing  of  weed-control  machinery,  certain  eco 
nomic  aspects  of  farm  electrification,  comparison  of  various  drying 
methods  for  grain,  determination  of  the  effects  of  drying  rate  on  pop 
corn,  irrigation  research,  roofing  studies  for  farm  buildings,  and  meth 
ods  of  cultivation. 

Statistics  is  also  proving  an  important  tool  in  food  technology  re 
search.  Foods  exhibit  to  a  marked  degree  what  is  widely  called  "bio 
logical  variation."  Their  constitution  is  heterogeneous,  and  their  com 
plexity  is  such  that  duplication  is  highly  improbable.  Food  properties 
are  affected  not  only  by  the  multiplicity  of  factors  influencing  their 
growth  but  also  by  the  infinite  variety  of  processing  and  storage  con 
ditions  to  which  they  may  be  subjected.  Thus,  it  is  impossible  to  give  a 
general  answer  to  a  question  such  as  "What  is  the  moisture  content  of 
corn?"  Before  attempting  to  answer,  one  would  first  have  to  ask  "What 
variety  ...  at  what  stage  of  its  growth  or  processing  cycle  .  .  ,  where 
was  it  grown?"  and  such  questions.  Having  obtained  the  necessary 
specifications,  the  food  technologist  might  be  able  to  quote  an  average 
value*  In  short,  he  might  specify  a  frequency  distribution  of  moisture 
content  of  sweet  corn  under  the  stated  conditions. 

This  type  of  problem  was  encountered  by  Bard  (6)  in  his  investiga 
tion  of  certain  palatability  factors — tenderness,  juiciness,  and  fiber 
cohesivenevss— -of  canned  beef  as  conditions  of  time  and  temperature  of 
processing  were  varied.  In  his  work,  a  statistical  approach  dictated  the 
design  of  the  experiment,  and  analysis  of  variance  was  freely  employed 
to  delineate  between  variation  due  to  raw  material  and  that  caused  by 

2  P.  L.  Alger,  "The  growing  importance  of  statistical  methods  in  industry/' 
General  Electric  Review,  Dec.,  1048,  p.  12. 

3  S.  S,  Wilks,   "Statistical  training  for  industry,"  Analytical  Chemistry,  Vol. 
19,  Dec.,  1947,  p.  955. 
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processing  treatments.  Another  example  in  food  technology  is  provided 
by  Bernhard  (7)  in  his  comparison  of  several  techniques  of  estimation 
of  the  frequency  of  occurrence  of  insect  fragments  in  cream-style 
corn.  The  conventional  methods  utilize  castor  oil  to  separate  the 
insect  fragments  by  flotation.  Bernhard  wished  to  compare  the  effi 
ciency  of  castor  oil  and  lard  oil,  each  at  three  different  temperatures 
for  each  of  four  different  times  of  mixing  oil  and  food  samples.  Of  course, 
repeated  samples  of  any  one  set  of  determination  conditions  could  be 
expected  to  yield  variable  results  for  the  number  of  insect  fragments 
present.  Thus,  statistical  methods  were  required  to  enable  the  variation 
among  mixing  times,  temperatures,  and  oils  to  be  analyzed. 

One  of  the  most  difficult  areas  of  food  research  is  that  of  evaluating 
a  food  product  in  terms  of  consumer  reaction.  It  is  well  known  that 
most  objective  tests  of  food  acceptability  (such  as  laboratory  measure 
ments  of  shear  strength,  etc.)  must  bo  correlated  with  consumer  pref 
erence  by  means  of  taste-panel  observations  in  order  to  achieve  firm 
standing.  The  problems  of  the  "taste  panel"  are  many.  To  what  extent 
is  the  taste  panel  representative  of  the  entire  population  of  tasters? 
How  is  variation  from  sample  to  sample  of  a  food  product  distinguished 
from  variation  from  tnstor  to  taster?  How  can  subjective  evaluation  of 
a  particular  property  of  food,  for  example,  odor,  be  separated  from 
evaluation  of  another  property,  such  as  flavor?  To  what  extent  can  or 
should  restrictions  such  as  instruction*)  to  evaluate  a  narrow  area  be 
placed  upon  the  taster,  in  view  of  the  fact  that  the  heart  of  the  taste- 
panel  system  is  the  use  of  the  integrated  pattern  of  individual  reaction 
to  a  complex  event? 

These  and  many  other  sxich  problems  of  food  evaluation  are  not  en 
tirely  solved.  Kven  the  basic  justification  for  the  introduction  of  sta 
tistical  analysis  is  not  always  clear.  For  example,  a  group  of  tasters 
may  be  asked  to  rank  in  order  of  merit  five  varieties  of  corn.  In  search 
ing  for  a  method  of  evaluating  results  of  this  type  of  problem,  many 
workers  have  followed  the  procedure  of  allotting  a  number  to  each  rank, 
e.g.,  5  for  first,  4  for  second,  etc.  These  figures  are  then  treated  an  num 
bers  ami  analyxed  by  analysis  of  variance  to  check  for  significant  varia 
tion  among;  the  five  varieties*  Huch  a  procedure  is  not  entirely  valid 
because  analysis  of  variance  can  only  be  used  with  numbers,  and  the 
ranking  figures  nre  not  originally  wet  down  an  qxmntitativo  relative  esti 
mates  of  tanto  reaction.  However,  in  view  of  the  lack  of  exact  methods 
of  analysis,  the  technique  mentioned  can  provide  valuable  assistance 
to  the  research  worker  who  deals  with  food  products, 

In  the  social  sciences,  statistical  methods  also  find  wide  application* 
Because  of  their  vital  interest  in  public  opinion,  the  major  political 
parties  have  become  acquainted  with  the  statistician*  In  economic  re 
search,  stat.lstieal  methods  are  almost  indispensable.  Economic  laws 
refer  to  muss  or  group  phenomena,  and  the  determination  of  these  laws 
often  depends  upon  the  judicious  use  of  statistical  techniques. 

In  marketing  research,  an  objective  may  be  increasing  eonsmmption 
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of  those  foods  shown  by  nutritional  studies  to  be  inadequately  supplied 
in  the  average  diet.  The  initial  role  of  statistics  here  is  merely  one  of 
finding  consumption  per  capita  and  comparing  it  with  some  goal.  Of 
course,  the  nature  of  the  distribution  of  consumption  per  capita  is  as 
important  as  the  average.  Another  objective  is  the  analyzing  of  mar- 
keting  methods  in  order  to  find  the  least  costly  way  of  doing  the  job. 
As  a  result,  a  smaller  portion  of  society's  efforts  need  be  expended  on 
product  handling. 

Measuring  demand  is  another  of  the  many  difficult  tasks  in  eco 
nomics.  The  research  worker  must  have  a  knowledge  of  consumer  pref 
erences,  supply  of  money,  its  distribution,  etc.  In  measuring  supply, 
he  must  have  an  intimate  acquaintance  with  marketing  functions, 
services,  and  costs  and  be  familiar  with  trends  in  operational  efficiency, 
both  physical  and  managerial.  Data  on  these  particulars  can  only  be 
digested  and  made  available  through  statistical  procedures. 

In  production  economics  probably  the  most  important  comparisons 
are  made  when  two  or  more  characteristics  are  simultaneously  studied 
or  measured.  This  involves  statistical  techniques  known  as  regression 
and  correlation.  These  tools  are  invaluable  to  the  economist.  By  using 
them,  one  factor  can  be  shown  in  its  relationship  with  other  factors.  For 
instance,  if  we  made  the  hypothesis  that  net  income  per  acre  becomes 
higher  as  farm  size  increases,  we  would  want  to  find  the  influence  of 
farm  size  on  net  income  per  acre.  We  might  then  collect  data  for  ten 
units  of  each  farm  size,  ranging  from  40  acres  to  perhaps  480  acres  with 
40-acre  increments.  These  data,  if  properly  obtained  in  accordance 
with  the  rules  of  statistical  procedure,  could  then  be  analyzed  to  aid 
the  researcher  in  making  a  contribution  to  the  theory  of  production 
economics. 

It  is  possible  to  go  on  almost  indefinitely  enumerating  the  fields 
wherein  statistics  is  being,  or  could  be,  applied.  Statistics  is  utilized 
for  a  systematic  approach  to  problems  in  public  health  studies,  epide 
miology,  demography,  biological  assay,  psychology,  education,  sociol 
ogy,  and  in  various  areas  of  home  economics.  Oddly  enough,  statistics 
is  not  confined  to  the  so-called  scientific  world,  for  it  also  is  applied  in 
the  arts.  It  has  been  used  to  aid  in  determining  the  authorship  of  cer 
tain  manuscripts  by  analyzing  the  length  of  sentences.  Authenticity  of 
paintings  has  al§o  been  established  by  analyzing  the  frequency  of  brush 
strokes. 

Although  statistics  is  very  much  in  the  realm  of  an  applied  science, 
it  has  its  theoretical  basis  in  mathematics.  Development  of  the  theo 
retical  branch  of  the  science  is  as  important  as  that  of  the  applied 
branch  if  progress  in  the  field  is  to  continue.  Unfortunately,  there  is  a 
gap  between  statistical  theory  and  application  much  the  same  as  exists 
in  other  sciences.  This  gap  is  steadily  being  closed,  but  the  job  is  far 
from  completion.  Thus  it  is  not  surprising  to  find  statistics  in  use  as 
both  a  science  and  an  art.  It  is  a  science  because  its  methods  are  basi 
cally  systematic  and  of  wide  application.  And  it  is  an  art  because  sue- 
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cess  in  its  application  depends  on  the  skill,  special  experience,  and 
knowledge  of  the  person  using  it.  The  research  worker  will  become  more 
appreciative  of  this  fact  as  he  gains  a  greater  understanding  of  statisti 
cal  methods  and  their  uses. 

1.7     SUMMARY 

The  scope  of  statistics  might  be  summarized  as  concerned  with:  the 
presentation  and  summarization  of  data,  the  estimation  of  population 
quantities  and  the  testing  of  hypotheses,  the  determination  of  the  ac 
curacy  of  estimates,  the  measurement  and  study  of  variation,  and  the 
design  of  experiments  and  surveys.  Inherently  and  inextricably  in 
volved  in  all  of  the  above-mentioned  areas  is  the  process  known  as 
methods  of  reduction  of  data,  or  the  computational  aspects  of  statistics, 

The  statistical  method  is  one  of  the  devices  by  which  men  try  to 
understand  the  generality  of  life.  Out  of  a  welter  of  single  events,  hu 
man  beings  seek  endlessly  for  general  trends.  Controlled,  objective 
methods  by  which  group  trends  'are  abstracted  from  observations  on 
many  separate  individuals  are  called  statistical  methods.  These  meth 
ods  are  especially  adapted  to  the  elucidation  of  quantitative  data  which 
have  been  affected  by  many  factors.  Statistical  methods  are  fxmda- 
mentally  the  same  whether  employed  in  the  analysis  of  physical  phe 
nomena,  the  study  of  educational  measurements,  the  study  of  data 
resulting  from  biological  experiments,  or  the  analysis  of  quantitative 
material  in  economics.  Agriculturists,  biologists,  chemists,  physicists, 
and  other  researchers  all  attempt  to  eliminate  the  many  miusanee  fac 
tors  which  influence  the  variables  under  investigation  and  to  concen 
trate  their  attention  upon  one  or  two  of  the  most  powerful  factors 
affecting  the  phenomena  being  studied.  Yet,  many  disturbances  are 
always  present  and  thus  statistical  methods  of  analysis  nre  vitally 
necessary*  Wherever  there  is  a  mass  of  numerical  data  that  admits  of 
explanation,  the  statistician  shoxild  consider  itB  analysis  his  field  of 
endeavor. 

To  utilise  statistical  methods  to  advantage,  a  person  should: 

(1)  Be  well  versed  in  the  subject  matter  of  the  field  in  which  the 
research  is  to  be  conducted* 

(2)  Know  how  to  organize  masses  of  data  for  efficient  tabulation 
and  how  to  lay  out  economical  routines  for  handling  data  and 
computation. 

(3)  Know  effective  means  of   presenting   data   in   tabular   and 
graphic  form. 

(4)  Have  some  knowledge  of  the  mathematical  theory  of  statistics 
in  order  to  have  assurance  there  is  a  fair  correspondence  be 
tween  his  data  and  the  assumptions  underlying  the  formulas 
ho  XISOH. 

(5)  Bo  acquainted  with  a  variety  of  ntatistieal  techniques,  the 
limitations  and  advantages  of  each,  the  assximptions  upon 
which  they  are  based,  the  place  each  occxipicn  hi  a  logical 
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analysis  of  the  data,  and  the  interpretations  which  can  be 
made  from  them. 

Statistics,  then,  boils  down  to  numerical  results,  the  methods  and 
processes  used  in  obtaining  them,  the  methods  and  means  for  estimate 
ing  their  reliability,  and  the  drawing  of  inferences  from  these  resultslv 

During  the  past  half-century,  the  thinking  world  appears  to  have 
awakened  to  an  unusually  deep  appreciation  and  respect  for  numerical 
facts.  There  has  been  a  growing  tendency  to  reduce  observations  and 
accumulated  data  to  an  orderly  arrangement,  making  possible  the 
evaluation  of  results  by  means  of  a  systematic  method  of  analysis. 

Formerly,  many  persons  believed  statistical  analysis  could  be  used 
only  in  certain  highly  specialized  fields.  However,  more  and  more 
methods  of  statistical  analysis  are  finding  their  way  into  scientific 
workshops  in  all  fields.  This  is  due  largely  to  the  fact  that  some  of  the 
enthusiastic  supporters  of  statistical  methods  have  worked  faithfully 
to  develop  and  explain  methods  useful  to  and  usable  by  those  persons 
not  specifically  trained  in  higher  mathematics. 

In  the  field  of  statistical  analysis  advancement  has  been  rapid  in 
recent  years.  Many  useful  methods  are  now  available  for  the  analysis 
of  data  arising  from  different  sources.  A  clear  grasp  of  simple  and 
standardized  statistical  procedures  will  go  far  to  elucidate  principles  of 
experimentation.  However,  one  must  remember  that  these  procedures 
are  in  themselves  only  a  means  to  a  more  important  end.  As  fundamen 
tal  and  pervasive  as  statistical  thinking  is  in  the  modern  world,  it  must 
not  be  considered  an  end  in  itself.  The  statistical  method  is  a  tool  for 
organizing  facts  so  they  are  rendered  more  available  for  study.  A  sta 
tistical  study  can  only  describe  what  is;  it  cannot  determine  what 
ought  to  be,  except  insofar  as  it  may  throw  light  upon  probable  con 
comitants  and  consequences  of  certain  situations.  It  is  fatuous  to  sup 
pose  the  statistical  method  can  provide  mechanical  substitutes  for 
thinking,  although  it  is  often  an  indispensable  aid  to  thinking.  Men 
see  increased  prevalence  of  the  statistical  method  in  scientific  studies; 
and,  sometimes,  failing  to  grasp  underlying  reasons  for  this  develop 
ment,  they  assume  the  use  of  tables,  formulas,  and  numerical  sum 
maries  is  a  badge  of  respectability.  As  a  result,  some  studies,  truly 
subjective  in  nature,  are  invested  with  a  false  show  of  objectivity. 
Thus,  a  vast  superstructure  of  computation  is  raised  upon  a  foundation 
inappropriate  to  such  treatment.  When  such  a  picture  is  painted,  it  is 
neither  good  statistics  nor  good  philosophy. 

Most  statistical  studies  will  not  answer  all  the  questions  we  would 
like  to  have  answered  regarding  a  given  problem.  From  the  very  nature 
of  statistical  work,  results  are  apt  to  be  partial  and  fragmentary,  rather 
than  complete  and  final.  Therefore,  the  researcher  must  make  up  his 
mind  that  questions  must  sometimes  be  left  unanswered.  He  must  also 
on  occasion  freely  admit  his  study  has  limitations.  Any  shortcomings  in 
his  work  and  the  danger  of  attributing  more  than  claimed  for  his  in 
vestigation  should  be  pointed  out  to  his  readers  by  the  researcher. 
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It  is  also  imperative  that  conclusions  drawn  from  observational  re 
sults  be  based  on  a  detailed  knowledge  of  procedures  employed  in  the 
investigation.  The  interpretive  function  in  statistical  analysis  is  one  of 
the  most  important  contributions  of  statistics,  and  the  statistician 
should  plan  experiments  and  investigations  which  will  yield  maximum 
information  and  valid  conclusions  from  scientific  research  data.  In 
ference  from  the  particular  to  the  general  must  be  attended  with  some 
degree  of  uncertainty,  and  research  workers  in  all  fields  of  science  must 
recognize  the  role  statistics  plays  in  this,  the  most  important  aspect  of 
research. 

The  role  of  statistics  in  research  is,  then,  to  function  as  a  tool  in  de 
signing  research,  in  analyzing  its  data,  and  in  drawing  conclusions 
therefrom.  A  greater  and  more  important  role  can  scarcely  be  en 
visioned.  In  utility  to  research,  statistics  is  second  only  to  the  mathe 
matics  and  common  sense  from  which  it  is  derived.  Clearly  the  science 
of  statistics  cannot  be  ignored  by  any  research  worker  even  though  he 
may  not  have  occasion  to  use  applied  statistics  in  all  of  its  detail  and 
ramifications. 

Problems 

1.1  Discuss  the  following  terms  or  phrases:  (a)  observation  mid  descrip 
tion;  (b)  cause  and  effect;  (c)  analysis  and  synthesis;  (d)  assumption, 
postulate,   and  hypothesis;    (e)    testing   of  hypotheses;   (f)    deduction 
and  induction. 

1.2  What   do    you   believe   operations   researchers    mean    by   the    phrase 
"measure  of  effectiveness"? 

1.3  Saaty  (39),  in  a  chapter  entitled  "8ome  Remarks  on  Scientific  Method 
in  Operations  Research/'  refers  to:  (a)  the  jxulgment  phase,   (b)  the 
research  phase,  and  (c)  the  action  phase.  Give  your  interpretations 
of  these  three  phases.  Then  compare  your  views  with  those  of  Saaty. 

1.4  Read  Chapter   12,   "Some  Thoughts  'on  Creativity,"  in  8aaty   (H<>). 
Then  prepare  a  brief  report  on  your  reactions  to  his  ideas* 

1.5  Prepare  a  report  on  the  pros  and  eons  of:  (a)  individual  research  and 
(b)  interdisciplinary  team  research. 

. ,6  DmcuHS  the  similarities  and  dissimilarities  of  pure  and  applied  research. 
K7        Prepare  a  report  on  the  subject  of  "scientific  method/1 

1.8  Prepare  a  report  on  your  interpretation  of  "the  role  of  statistics  in 
research," 

1.9  By  consulting  the  technical  journal**  in  your  area  of  specialization, 
prepare  and  submit  a  list  of  references  (properly  documented)  which 
illustrate  the  use  of  statistical  methods. 

1.10  Submit  a  list  of  publications  (bookn,  monographo,  papors,  etc.)  which 
you  believe  would  be  worthwhile  additions  to  the  references*  prenerited 
with  this  chapter. 
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CH  APTE  R    2 

MATHEMATICAL  CONCEPTS 

IT  is  DIFFICULT  to  achieve  a  clear  understanding  of  statistical  methods 
without  discussing,  to  some  extent,  the  underlying  theory.  Since  the 
theory  of  statistics  is  intimately  associated  with  the  theory  of  prob 
ability  and,  further,  since  probability  is  an  important  branch  of  mathe 
matics,  this  implies  that  every  student  of  statistical  methods  should 
be  willing  to  "use  a  little  mathematics  once  in  a  while!"  Consequently, 
it  seems  desirable  to  present  here  a  few  basic  mathematical  concepts, 
formulas,  and  techniques  which  may  prove  helpful  to  the  reader.  These 
ideas  will  be  presented  as  definitions  and/or  theorems1  (without  proofs). 

SET  THEORY 

The  subject  of  the  theory  of  sets  is  fundamental  in  mathematics.  In 
this  text,  however,  we  shall  be  concerned  only  with  a  few  basic  concepts 
which  are  useful  in  the  theory  of  probability. 

Definition  2.1     A  set  is  a  collection  of  elements.2 

Definition  2.2  The  universal  set  is  the  set  consisting  of  all  elements 
under  discussion.  (NOTE:  The  universal  set  is  some 
times  referred  to  as  a  space.) 

Definition  2.3     The  null  set  is  the  set  containing  no  elements  at  all. 

Definition  2.4  Associated  with  each  set,  A,  is  another  set,  A',  called 
the  complement  of  A  and  defined  to  be  the  set  consist 
ing  of  all  the  elements  of  the  universal  set  which  are  not 
elements  of  A. 

Definition  2.5  For  any  two  sets,  A  and  JB,  the  union  of  A  and  B  is  the 
set  consisting  of  all  elements  which  are  either  in  A  or 
in  B  or  in  both  A  and  B.  The  union  of  A  and  B  is  com 
monly  denoted  by  A^JB. 

Definition  2.6  For  any  two  sets,  A  and  J5,  the  intersection  of  A  and  B 
is  the  set  consisting  of  all  elements  which  are  both  in 
A  and  R.  The  intersection  of  A  and  B  is  commonly 
denoted  by  AC\B  or  by  AB. 

Theorem  2.1       If  A  and  B  arc  two  sots  which  have  no  common  ele 
ments,  then  the  sot  AJS  is  the  null  set. 
A  useful  device  for  illustrating  the  properties  of  the  algebra  of  sets 

is  the  Venn  diagram.  In  such  a  diagram,  the  points  interior  to  a  rec- 

1  The  expression  "theorems"  will  bo  xisod  in  a  very  broad  sense  to  describe 
various  proportion,  propositions,  theorems,  ote.,  which  result  from  the  definitions. 
While  not  strictly  correct,  this  procedure  will  materially  reduce  the  number  of 
terms  to  bo  absorbed  by  the  reader. 

a  The  term  element  will  bo  left  undefined. 
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FIG.  2.1— A  Simple  Venn  diagram. 


tangle  constitute  the  xmiversal  set.  Arbitrary  sets  within  the  universal 
set  (that  is?  subsets  of  the  universal  set)  will  be  represented,  for  con 
venience,  by  the  points  interior  to  circles  within  the  rectangle.  In 
Figure  2.1  the  set  A  is  shaded  by  vertical  lines,  the  set  B  is  shaded  by 
horizontal  lines.  Since  A  I)  9*0,  that  is,  does  not  equal  the  null  set,  AB 
appears  as  the  erosshatehed  area. 

Probability  theory,  which  we  shall  sximmari^e  in  the  next  chapter, 
depends  on  the  number  of  elements  in  a  set.  We  will  denote  the  number 
of  elements  in  any  arbitrary  set  A  by  n(A). 
Theorem  2.2     If  A  and  B  have  no  elements  in  common, 

n(A^JB)  «n(-4)4-n(jR). 

Theorem  2*3      If  A  and  B  have  no  elements  in  common,  n(AB)  —  0, 
Theorem  2,4     For  arbitrary  sets  ,*1  and  B,  it  is  true  that 


NOTATION 

As  in  all  subjects,  the  system  of  notation  employed  in  a  matter  of 
concern  to  the  reader.  Since  statistics  is  HO  entwined  with  mathematics, 
it  in  no  surprise  that  problems  of  notation  arise*  In  the  remainder  of 
this  book  every  attempt  will  be  made  to  define  and  explain  special 
symbols  and  notation.  However,  at  this  point  it  seems  appropriate  to 
mention  some  of  the  more  frequently  occurring  signs  and  symbols. 
Definition  2,7  The  absolute  value  of  a  number,  x,  denoted  by  |xf  ?  is 

its  numerical  value  neglecting  its  algebraic  nigtx*  For 

example,  j  ~~3|  —3  and  J3|  —3, 
Definition  2,8       #  =  y  is  read  "x  is  equal  to  37." 
Definition  2.9       ;rp^//  is  read  ".#  is  not  equal  to  |/.?> 
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Definition  2.10 
Definition  2.11 
Definition  2.12 
Definition  2.13 
Definition  2.14 


x=z/  is  read  "x  is  approximately  equal  to  y* 

x<y  is  read  "re  is  less  than  y." 

x<y  is  read  "x  is  less  than  or  equal  to  y." 

x>y  is  read  "x  is  greater  than  y." 

x>y  is  read  "x  is  greater  than  or  equal  to  y.' 


Definition  2.15 


Y,  = 


Y2  + 


-    + 


NOTE:  The  Greek  capital  letter  sigma,  )?  is  known 
as  the  summation  sign.  Further,  i  is  called  the  index 
of  summation,  while  1  and  n  are  known  as  the  limits 
of  summation. 


Theorem  2.5 


Theorem  2.6 


cYi  =  c 


t-  where  ^  is  a  constant. 


Theorem  2.7 


Theorem  2.8 


+ 


,  + 


Theorem  2.9          (  ^  Y^\  -  J£    Y*  +  2  S     S 


NOTE:  In  this  theorem  the  notation  ]>D*-<y  ^s  in~ 
terpreted  to  mean  that  we  sum  all  possible  products 
YiY,  letting  i  and  j  go  from  1  to  n,  subject  only  to  the 
restriction  that  in  any  particular  term, 


Definition  2.16 


t-  -  (FO(F2) 


(Fn). 


NOTE:  In  contrast  to  Definition  2.15,  in  which  we 
introduced   23  as  ^e  summation  sign,  we  have  here 
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introduced  the  Greek  capital  letter  pi,  TI?  as  the  prod 
uct  sign. 


Theorem  2.1O 


i  =  (i)(2) 


(n)  =  nl 


NOTE:  The  symbol  nl  is  called  n  factorial,   or  fac 
torial  n. 
Definition  2. 17     0 ! «  1 . 

NOTE:  This  will  prove  useful  later. 

PERMUTATIONS   AND  COMBINATIONS 

Permutations  and  combinations  are  concerned  with  the  different 
subgroups  and  arrangements  that  can  be  formed  from  a  given  set.  A 
permutation  is  a  particular  sequence  (i.e.,  arrangement)  of  a  given  set 
or  subset  of  elements,  while  u  combination  is  the  set  or  subset  without 
reference  to  the  order  of  the  contahied  elements. 


Definition  2,18 


If  an  event,  A,  can  occur  in  n(A)  ways  and  if  a  differ 
ent  event,  B,  can  occxir  in  n(75)  ways,  then  the  event 
"either  A  or  R"  can  occur  in  n(A)+n(B)  ways  pro 
vided  A  and  B  cannot  occur  simxiltaneously. 
NOTE:  You  will  notice  the  similarity  between  this 
definition  and  Theorem  2.2, 

If  an  event,  A,  can  occur  in  n(A}  ways  and  a  sub 
sequent  event,  fij  can  occur  in  n(B)  ways,  then  the 
event  "both  A  and  B)f  can  occur  in  n(A)  -n(B)  ways. 
An  r-pormutation  of  n  things  is  an  ordered  selection  or 
arrangement  of  r  of  them. 

An  r-combination  of  n  things  is  a  selection  of  r  of  them 
without  regard  to  order. 

The  number  of  different  permutations  which  can  be 
formed  from  n  distinct  objects  taken  r  at  a  time  is 
JP(n,  r)=tt(/i— 1)  *  •  -  (n  — r+l)«n!/(n  — r)! 
The  number  of  different  permutations  which  can  be 
formed  from  n  objects  taken  n  at  a  time,  given  that 
n»-  are  of  type  it  where  i=*  1,  2,  •  -  *  ,  A:,  and  ^nt- «=n« 
is  P(n\  niy  na,  •  -  •  »  M*)  ^nl/niln^l  *  *  -  n&\ 
The  number  of  different  combinations  which  can  be 
formed  from  n  dintinct  objects  taken  r  at  a  time  ia 


Definition.  2,19 

Definition  2.20 
Definition  2.21 
Definition  2.22 

Definition  2,23 

Definition  2*24 

Definition  2.25     <7(n,  r)  «  0  for  r  <0  and  r>n. 

SOME  USEFUL   IDENTITIES  AND  SERIES 

In  statistical  work  it  in  often  necessary  to  sum  a  series  of  terms  or 
simplify  a  particular  expression.  A  few  of  the  more  useful  results  are 
given  here  for  roady  reference. 
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Theorem  2.11        (a  +  by  = 

r=0 

Theorem  2.12       If,  in  Theorem  2.11,  we  let  a=l  and  &  =  Z,  we  obtain 


Theorem  2.13       If,   in   Theorem   2.11,    we   let   a  =  q  and   &  =  p=l—  -q 
where  0<p<l,  we  obtain 

n 

1  =  ^JL,  C(n,  r)qn~rpr. 

r»»0 

This  is  a  very  useful  expression  in  probability  and 
statistics. 


n—  1 


Definition  2.26     ex  =  exp  (oc)  === 
Theorem  2.14       (1  —  #n)/(l  —  c 

00 

Theorem  2.15       i/(l  —  x)n  =  1C  C(n  +  i  —  1,  z)^'- 
Theorem  2.16         ]T)  C(a,  i)  -C(J,  c  —  i)  =  C(a  +  &,<;). 

i— 0 
n 

Theorem  2.17        ^  i  =  n(w  +  l)/2. 

n 

Theorem  2.18         y^  ^2  ...  n^n  _^  i)(2>z  +  l)/6. 


Theorem  2.19        23  i^<  =  a?/(l  —  x)*     for  —  1  <  x  <  1. 

i— 1 

SOME    IMPORTANT   FUNCTIONS 

Some  mathematical  functions  not  always  presented  in  courses  in 
elementary  mathematics  are  of  great  interest  to  the  statistician.  Two  of 
these  will  be  presented  here  for  your  convenience. 

Definition  2,27     The  gamma  function,  denoted  by  r(y>),  is  defined  by 
the  integral 
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y-»   OO 

T-T-?       I  3CP~~~^'& — "K(Jf2C 

Jo 
for  p>0.  An  alternative  form  for  this  function  is 

r(p)   =  2   f     yip-ierSdy, 

J   0 

where  the  transformation  used  was  x  =  yz+ 
Theorem  2*20       If,  in  Definition  2.27,  we  let  p~n  where  n  is  a  positive 

integer,  we  obtain  F(n)  =  (n— 1) -r(ri  —  l)  =  (n— 1)  ! 
Theorem  2.21       F(i)  —  vV^  (^r)172. 
Definition  2,28  The  beta  function,  denoted  by  J3(p,q),  is  defined  by  the 

integral 

i 


=    I 
•^  o 


for  p>Q  and  #>CK  An  alternative  form  for  this  func 
tion  is 


,  <?> 


/W  /Ii 
^ 


0  cos*«-1  Q  de, 


whore  the  transformation  used  was  ^  =  vsi 
Theorem  2*22       /3(p,  q)  ==/5(r/,  p). 
Theorem  2.23       ft(p,  ff)  =  r(p) 4 

MATRICES 

Many  of  the  methods  to  be  discussed  in  this  book  depend  on  the 
theory  of  linear  .statistical  models.  This  theory  is  most  expeditaoxisly 
handled  in  terms  of  matrix  algebra.  Therefore,  it  is  appropriate  that 
the  reader  be  made  aware  of  the  basic  concepts*  As  in  the  preceding 
sections  of  this  chapter,  definitions  and  theorems  will  be  stated  without 
discussion. 

Definition  2.29     A  matrix  A  of  dimension  rXc  is  a  rectangular  array 
of  elements  a^  arranged  in  r  rows  and  c  columns: 


Definition  2*30 
Definition  2.31 


If  it  is  necessary  to  emphasise  the  dimension,  we  shall 
write  ArG  instead  of  A. 

If  ,4  is  of  dimension  nXl>  it  is  called  an  nXl  vector. 
A™B  when  and  only  when  .4  and  &  are  of  the  same 
dimension  and  a,;  —  6^-  for  all  i  and  j. 
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Definition  2.32 


Definition  2.33 


Definition  2.34 


The  product  of  a  matrix  A  and  a  scalar  (ordinary) 
number  k  is  a  matrix  B  where  &#  =  kaiy-  for  all  i  and  j. 
That  is,  kA  =*=  Ak  =  5. 

The  sum  of  two  matrices,  -4  and  Z?,  can  be  defined  only 
when  A  and  B  are  of  the  same  dimension.  Then 
<A-hJ?  =  C  where  c1-y  =  a^+&^. 

The  product  of  two  matrices,  say  AB,  can  be  defined 
only  when  the  number  of  columns  in  A  equals  the  num 
ber  of  rows  in  B.  Then  AB  =  C  where 


Definition  2.35 


Theorem  2.24 
Theorem  2.25 
Theorem  2.26 
Definition  2.36 
Theorem  2.27 

Definition  2.37 


Definition  2.38 

Theorem  2.28 
Definition  2.39 

Definition  2.40 
Definition  2.41 
Definition  2.42 


«=1 

NOTE:  We  must  be  very  careful  of  the  order  of  the 
factors  when  multiplying  one  matrix  by  another. 
Even  if  AB  and  BA  are  both  defined,  they  are  not 
necessarily  equal. 

The  transpose  of  a  matrix  A  of  dimension  rXc  is  de 
noted  by  A',  where  Af  is  a  matrix  of  dimension.  eXr 
in  which  a'-/  =  c&yi.  That  is,  the  rows  of  A'  are  the  col 
umns  of  A  and  the  columns  of  Af  are  the  rows  of  A. 


=  AA,   A* 


If  r  =  c,  A  is  called  a  square  matrix, 
For  a  square   matrix   A,  we   can   write 
~AAA,  etc. 

In  a  square  matrix  of  dimension  nXn,  the  elements 
an,  #22,  -  •  •  ,  an«,  form  the  main  diagonal  and  are 
known  as  diagonal  elements. 

A  square  matrix  which  is  symmetric  with  respect  to 
its  main  diagonal  is  called  a  symmetric  matrix. 
For  a  symmetric  matrix,  A'  =  A. 

A  symmetric  matrix  in  which  a^'  =  0  for  all  i^j  is 
called  a  diagonal  matrix. 

A  diagonal  matrix  in  which  a«=l  for  all  i  is  called  a 
unit  (or  an  identity)  matrix,  and  will  be  denoted  by  /. 
A  matrix  having  all  its  elements  equal  to  zero  is  called 
the  null  matrix,  and  will  be  denoted  by  0. 
The  determinant  of  a  square  matrix  A  of  dimension 
denoted  by  \A\,  is  defined  by 


where  the  second  subscripts  n,  r%,  -  •  •  ,  rn  run  through 
all  the  n\  possible  permutations  of  the  numbers 
1,  2,  -  -  -  ,  n,  and  the  sign  of  each  term  (either  + 
or  — )  is  determined  according  to  a  well-defined  rule. 


24 
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NOTE:  If  A  is  of  dimension  2X2,  then 


Theorem  2.29 
Theorem  2.30 

Theorem  2.31 
Theorem  2.32 
Theorem  2.33 

Definition  2.43 
Definition  2.44 

Theorem  2.34 


A     = 


For  any  square  matrix  A,   \  A  j  =  J  A'  J , 
If  two  rows  (or  columns)  of  a  square  matrix  are  inter 
changed,  the  determinant  changes  its  sign. 
If  two  rows  (or  columns)  of  a  square  matrix  are  identi 
cal,  the  determinant  is  0. 

If  -4,  B,  and  C  are  square  matrices  such  that  AB=  C, 
then  \A\  -  \B\  -|C|. 

If  a  multiple  of  one  row  (column)  is  added  to  another 
row  (column)  of  a  square  matrix,  the  determinant  is 
unchanged. 

For  any  arbitrary  matrix  -4?  the  determinant  of  any 
square  submatrix  of  A  is  called  a  minor  of  A. 
For  a  square  matrix  A,  the  minor  obtained  by  deleting 
the  ith  row  and  yth  column,  multiplied  by  (•—  I)*4"*", 
is  known  as  the  cofactor  of  at-/.  We  shall  denote  the 
cofactor  of  a»y  by  cof  a*/. 

For  a  square  matrix  A  of  dimension  nXn?  the  de 
terminant  |  .4 1  may  be  found  by  evaluating 

**  n 

*>  -  2:  < 


Definition  2,45 
Definition  2.46 


If,  for  a  square  matrix,   \A\^09  then  A  is  of  rank  n 
and  A  is  said  to  bo  nonsingular. 

For  a  nonsingular  sciuare  matrix  -4,  the  inverse  of  A 
is  denoted  by  A™1  and  is  defined  by 


Theorem  2.35       For  a  nonmngulur  square  matrix  A,  it  is  true  that 
Theorem  2*36       For  a  nonsingular  square  matrix  ,4,  it  is  true  that 

i    A      i """""  *  w\   \    A  *~"  *  i 

V.**    J  **"**   V  **         J     * 

LINEAR  EQUATIONS 

Many  times  in  statistical  work  wo  find  it  necessary  to  discuss  sys 
tems  of  linear  equations  such  as: 

+  a22*a  +  •  •  •  4-  a*»x»  -  y*  ^^  ^ 


Hr 
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The  matrix  notation  introduced  in  the  preceding  section  gives  us  an 
extremely  concise  method  of  representing  such  systems.  For  example 
it  is  clear  that  * 


AX=  Y 

is  the  same  as  Equation  (2.1)  if 


(2.2) 


A  = 


<Zin~ 


X  = 


and     Y  = 


Theorem  2.37     If  Jl  in  Equation    (2.2)   is  nonsingular.   then 

X=A-iY.  That  is, 


oc<i   — 


j_          •- 

~rr  23  y*(cof  a^ 
A      iwi 


forj  =  1,  2,  -  -  -  ,  n. 
NOTE:  Another  way  of  writing  this  is 
matrix  A  in  which  a^ 
has  been  replaced  by 
i=l,  2,  •  •  •  ,  n 


X*    = 


That  is, 


W-i      y« 


J  =  l, 


Problems 


2.1 


2.2 
2.3 


Consider  a  box  of  resistors  that  are  color  coded  (red,  black,  or  yellow) 
according  to  resistance  rating.  Suppose  that  all  red  (JB)  resistors 
and  some^of  the  black  (2J)  resistors  are  manufactured  by  company  E. 
The  remainder  of  the  black  resistors  are  manufactured  by  company  Ft 
while  the  yellow  (F)  resistors  are  manufactured  by  company  G.  The 
universal  set  consists  of  all  the  resistors  in  the  box.  Letting  R  stand 
for  the  set  of  all  red  resistors,  B  stand  for  the  set  of  all  black  resistors, 
and  so  on,  write  as  many  equations  and  inequations  as  you  can  to 
describe  the  relations  existing  among  the  various  sets. 
List  all  subsets  of  the  set  {X,  Y,  Z}. 

Consider  the  space  consisting  of  the  26  lower  case  letters  of  the  alpha 
bet.  If  the  sets  A,   B,   and   C  are  defined  as    A «  { a,    &,    c,    d     e\ 
B~  {b,  d,f,  h,j}f  and  C-  {c,/,  i,  I,  m} ,  find: 
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(<*)  AVJB,  B\JC,  A\JB\JC 

(&)  AB,  J5C,  ABC 

(c) 

(d)  (A\JB)  (A\JBf) 

(e)  (A^JB)  (A'WB)  ( 

2.4  How  many  different  subsets  arc  there  in  a  set  containing  n  distinct 
elements? 

2.5  Draw  Venn  diagrams  for  the  following  and  shade  the  indicated  area: 

(a)  A^JA'BC 

(6) 

(c) 


2.6  Referring  to  Problem  2.3,  give  the  number  of  elements  in  each  set 
discussed. 

2.7  If  A"i=*4,  A^aa—  3,  AT'a  =  i,  X"4  =  7,  1^=8,  Fo  =  2,  F3  =  —  1,  and  r4  =  3, 
find: 


(4 
T,  -Y* 
*—L 


t— 1 

2.8        Given  the  following  observations: 

Fm  -  4  Fm  -  3  Fun  -  0 

—  3  Fm  —  —  3  Fin*  —  9 

«  —  1  Fm  «*  0  F*M  -•  4 

«  -  8  F4ftl  «  14  K4M  -  0 

«*  22  F4t«  *•  7  F4is  «-•  0 

find: 

433 


1— I 
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3-1   &-1 

2.9  Evaluate:  P(7,  3),  P(5,  5),  P(17,  2),  P(7,  4),  P(7,0). 

2.10  Evaluate:  P(ll;  2,  2,  5,  2),  P(8;  5,  3). 

2.11  Evaluate:  (7(7,  3),  C(5,  5),  C(17,  2),  (7(7,  4),  (7(7,0),  (7(8,  5),  C(8,  — ! 
C(7,  8). 

2.12  Using  the  binomial  expansion  (Theorem  2.11),  and  letting  a  =  6  = 
verify  that  26  =  32.  Write  out  each  term  and  show  its  value. 

2.13  Expand  (i+f)4. 

2.14  Find 

]C  C"(7>  y)'C(5,  4  —  a:  —  y)- 

v—o 

2.15  Evaluate: 

r  °° 

(a)       I      ics/3e  2ic^^ 
*/  o 


rl 

J  0 

«  oo 

I 

•^  o 


2.16  Show  that  C(n,  r)~C(n  — 1,  r)+C(»  — 1,  r-1). 

2.17  A  lot  contains  100  items.  A  single  sample  of  two  items  is  to  be  selected. 
How  many  differently  constituted  samples  are  possible? 

2.18  If 


a 


find:  .4-f-  J?,  4  —  B,  and  AB. 
2.19     If 


find  -45  and 

2.20  Find  the  transposes  of  the  matrices  given  in  Problems  2.18  and  2.19. 
Also   find  the  transposes  of  the  solution  matrices  in  each  of  those 
problems. 

2.21  Find  the  inverses  of  the  matrices  in  Problem  2.18. 

2.22  Transform  the  matrices  in  Problem  2.18  into  diagonal  form. 

2.23  Evaluate  \A\  by  expanding  by  minors  (see  Theorem  2.34)  for 


rl  -3    1-1 

2       1     2 
Ll       5     3J 


Is  A  singular? 

2.24     Solve  the  following  sot  of  equations  using  determinants. 

10 
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CHAPTER    3 

A  SUMMARY  OF  BASIC  THEORY  IN 
PROBABILITY  AND  STATISTICS 

As  INDICATED  at  the  beginning  of  Chapter  2,  a  proper  appreciation  of 
statistical  methods  is  difficult  without  an  understanding  of  the  associated 
theory.  If  we  do  not  have  sufficient  grounding  in  the  theory  of  prob 
ability  and  statistics,  the  possibility  of  misapplication  of  methods 
based  on  this  theory  is  enhanced. 

PROBABILITY 

In  general,  statistics  enters  into  scientific  method  through  experi 
mentation  or  observation.  Any  investigation,  is  only  a  means  to  an  end. 
It  is  a  device  for  testing  a  stated  hypothesis  or  for  acquiring  an  amount 
of  knowledge — however  small — from  which  a  conclusion  may  be  drawn. 
Most  statements  resulting  from  scientific  investigations  are  only  in 
ferences.  They  are  uncertain  in  character.  The  measurement  of  this  im- 
certainty  by  use  of  the  theory  of  probability  is  one  of  the  most  important 
contributions  of  statistics. 

Probability  is  just  a  measure  of  the  likelihood  of  occurrence  of  a 
chance  event.  A  fairly  simple  definition  of  probability,  generally  re 
ferred  to  as  the  classical  definition  of  probability,  is : 

Definition  3.1  If  an  event  can  occur  in  N  mutually  exclusive  and 
equally  likely  ways,  and  if  n  of  these  possess  a  charac 
teristic  E,  then  the  probability  of  E  occurring  is  the 
fraction  n/N.  This  is  customarily  written  P(E}  ~n/N. 

There  is  a  natural  relation  between  set  theory  and  probability  theory 
which  is  easily  recognized  once  we  adjust  to  a  change  in  language.  In 
probability,  the  universal  set  is  called  the  sample  space?  each  subset  is 
called  an  event,  and  an  element  is  referred  to  as  a  sample  point.  Then, 
the  definition  of  probability  is: 

Definition  3.2  The  probability  of  occurrence  of  the  event  A  is  the 
ratio  of  the  number  of  sample  points  in  the  event  A 
to  the  number  of  sample  points  in  the  sample  space. 
Symbolically,  P(A}  ~n(A}/N  where  n(A}  is  the  num 
ber  of  sample  points  in  the  event  A,  and  N  is  the  num 
ber  of  sample  points  in  the  sample  space. 

Some  additional  expressions  encountered  in  probability  and  statistics 
are  the  words  experiment  and  outcome. 

C291 
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Definition  3.3      An  experiment  is  any  well-defined  action. 
Definition  3.4     Each  possible  result  of  an  experiment  is  called  an  out 
come  (of  the  experiment) . 

The  tie-in  between  the  two  definitions  just  given  and  the  ideas  ex 
pressed  earlier  is  as  follows:  An  outcome  is  a  sample  point,  the  totality 
of  outcomes  is  the  sample  space,  and  an  event  is  a  set  of  outcomes. 

Definition  3.5  A  random  {chance)  variable  is  a  numerically  valued 
function  defined  over  a  sample  space.  It  is  a  rule  which 
assigns  a  numerical  value  to  each  outcome  of  an  experi 
ment. 

Definition  3.6  A  discrete  random  variable  is  one  which  can  take  on 
only  a  finite  or  a  denumerable  number  of  values. 

Definition  3.7  A  continuous  random  variable  is  one  which  can  take  on 
a  continuum  of  values. 

NOTE :  A  one-dimensional  continuous  random  variable 
is  most  easily  thought  of  as  one  wrhich  can  take  on  any 
value  within  a  specified  interval  along  a  straight  line. 

The  definitions  of  probability  advanced  in  the  preceding  paragraphs 
are  such  that  difficulties  are  sometimes  encountered  in  their  use.  For 
example,  it  is  not  always  easy  to  tell  if  two  events  are  equally  likely. 
Then,  too,  how  do  we  handle  the  concept  of  an  experiment  that  can  be 
performed  infinitely  many  times? 

Before  formulating  a  new  definition  that  will  give  us  greater  flexi 
bility,  let  us  examine  some  preliminary  ideas.  Consider  a  random  ex 
periment  8  that  may  be  repeated  many  times  under  uniform  conditions. 
Each  time  the  experiment  is  performed,  observe  whether  an  event  E 
does  or  does  not  take  place.  In  the  first  n  performances  of  8,  E  will  occur 
a  certain  number  of  times,  say/.  We  shall  call  the  ratio  f/n  the  relative 
frequency  of  E  in  the  first  n  performances  of  the  experiment  8.  It  will 
be  observed  that  f/n  will  generally  tend  to  become  more  or  less  con 
stant  for  large  n.  This  phenomenon  is  sometimes  referred  to  as  statisti 
cal  regularity.  It  is  now  conjectured  that  for  given  8  and  E  we  should 
be  able  to  find  a  number  P  such  that  as  n,  the  number  of  performances 
of  8,  gets  very  large,  the  ratio  f/n  should  be  approximately  equal  to  P. 

Definition  3.8  "Whenever  we  say  that  the  probability  of  an  event  E 
with  respect  to  an  experiment  8  is  equal  to  P,  the 
concrete  meaning  of  this  assertion  will  thus  simply  be 
the  following:  In  a  long  series  of  repetitions  of  8,  it  is 
practically  certain  that  the  (relative)  frequency  of  E 
will  be  approximately  equal  to  P."1 

Theorem  3.1          For  any  event  E,  it  is  true  that  Q<P(E}  <1. 

Theorem  3.2          P(J5)  +P(not  E}  =  1. 

NOTE :  Using  the  set  notation  introduced  in  Chapter 
2,  this  would  appear  as  P(E)  +P(I?')  =  1. 

1  H.  Cramer,   Mathematical  Methods  of  Statistics,  Princeton  University  Press 
Princeton,  N.J.,   1946,  p.   148.  ' 


Theorem  3.3 


PROBABILITY 

For  arbitrary  events  A  and  B,  P(A  or  B}  = 
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Theorem  3.4 


NOTE  :  For  three  events,  this  extends  to 
(5)+P(C)  —  P(AB}  — 

The  extension  to  more  than  three  events 
can  be  made  quite  easily. 

If  A  and  B  are  mutually  exclusive  (i.e.,  have  no  ele 
ments    in    common),    then    P(A    or 


NOTE:  For  three  events  that  are  pairwise  disjoint 
(i.e.,  mutually  exclusive),  this  extends  to  P(A^JB^JC) 
=  P(A)+PCB)+P(C).  The  extension  to  more  than 
three  events  is  obvious. 

Let  A  be  an  event  in  an  arbitrary  sample  space  such 
that  P(A}  7^0.  Let  B  be  any  event  in  the  same  sample 
space.  Then,  the  conditional  probability  that  B  occurs, 
knowing  that  A  has  occurred,  is  defined  by  P(B\  A) 


Definition  3.9 


Theorem  3.5          For  arbitrary  events  A  and  B,  P(A  and  J?)  =P(AJ3) 


Definition  3.10 


Theorem  3.6 


Theorem  3.7 


NOTE:  For  three  events,   this  extends  to   P(ABC} 

=  P(A)  -P(B\A)  -P(C\AB).  It  should  be  realized  that 

other  permutations  of  the  factors  and  the  letters  are 

possible.  The  extension  to  more  than  three  events  is 

obvious. 

Two  events  A  and  B  are  said  to  be  statistically  inde 

pendent  if  P(A\B)=P(A)  and  P(B  \  A)  =PCB).  This 

is  equivalent  to  saying  that  A  and  B  are  statistically 

independent  if  P(AB)  =  P(A}  -P(B}. 

NOTE:  Three  events  (A,  .B,  and  C)  are  mutually  in 

dependent       if       P(A|J3)=PCA),    P(A|.BC)=P(A), 

P(ABlC')  ~P(AB),  and  so  on  for  all  possible  events. 

This  is  equivalent  to  saying  that  A,  jB,   and  C  are 

mutually  independent  if  A,  B,  and  C  are  pairwise  in 

dependent     [that    is,    PCAJ3)=P(A)-P(B),     P(AC) 

=  P(A)-P(C),     and     P(J5C)  =P(J5)  -P(C)  ],     and     if 

P(ABC}  -P(A)  -P(B)  -P(C). 

If   A    and   B    are   statistically   independent,    P(AJ5) 


NOTE:  For  three  events  that  are  mutually  inde 
pendent,  this  extends  to  P(ABC}  =P(A)  -P(B)  -P(C). 
The  extension  to  more  than  three  events  is  obvious. 
For  any  events  A  and  B, 


P(A  ^J  J5)  =   1  —  P(not 
-!-[{!- 


-P(not  B  \  not  A) 


•P{l  —  P(B\  not  A}}]. 
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Theorem  3.8         If  A  and  B  are  statistically  independent,  Theorem  3.7 
becomes 

P(A  VJ  B)  =  1  -  P(not  /I)  •  P(not  B) 


Theorem  3.9 


Theorem  3.10 


Definition  3.11 


Theorem  3,11 


NOTE :  This  can  easily  be  extended  to  k  events  that 
are  mutually  independent  by  writing 


P(El 


=  1  -  [{1 


{1  - 


Let  7/i,  7/2,  -  -  •  ,  7/7i  be  mutually  exclusive  events 
whose  union  is  the  sample  space.  Let  E  be  an  arbitrary 
event  in  the  same  sample  space  such  that 
Then 


Referring  to  Theorem  3.9  and  invoking  the  fact  that 

7^),  it  is  seen  that 


P(7/l 


n  //o 
+"p(//7J 


^|  77.) 
77n.    This 


and  similar  results  hold  for  77$,  -  *  • 
theorem  in  known  as  JB ayes'  theorem. 
ConRidor  an  experiment  with  only  two  possible  out 
comes,  that  i«t  /I  and  A'.  If  at  each  performance  of 
the  experiment  (i.e.,  each  trial),  P(/t)  remains  the 
Hume,  then  the  repeated  trials  are  known  as  Bernoulli 
trials. 

NOTE:  When  clincussing  HeruoxiIH  trials,  it  in  CUH- 
tomary  to  refer  to  one  of  the  two  possible  «ut<iome« 
an  a  success  and  to  the  other  as  a  failure. 
Let  fo(^;  n,  p)  denote  the  probability  that  n  Bernoxilli 
trialn  will  result  in  exactly  x  HUCCOSSOS  and  n  — j?  faii- 
tireH  when  the  probability  of  a  success  at  each  trial  is  p 
and  the  probability  of  a  failure  at  each  trial  in  #«  I 
—  p.  Then  6(x;  n,  p)  —  (^(n?  as)p*ffft""*. 
NOTE:  Probabilities  giveti  by  &(#:;  n,  p)  are  often  re 
ferred  to  as  binomial  probabilities.  Kvaluation  of  bi 
nomial  probabilities  can  be  a  tedious  task.  However, 
tables  (10,  II)  are  available  and  can  be  xined  to  good 
advantage, 
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MATHEMATICAL   EXPECTATION 

Definition  3.12  Consider  the  function  6(05;  n,  p)  introduced  in  Theo 
rem  3.11.  The  expected  value  of  x,  which  will  be  de 
noted  by  E[x],  is  defined  by  E[x}-=np.  That  is,  the 
expected  number  of  successes  in  n  trials  is  defined  to 
be  np,  even  if  it  may  be  impossible  to  observe  such  a 
number. 

PROBABILITY   DISTRIBUTIONS 


Definition  3.13 

Definition  3.14 
Definition  3.15 
Theorem  3.12 

Theorem  3.13 


For  any  random  variable  X,  we  will  denote  the 
P(X<x}  by  F(x}.  Further,  F(x)  will  be  referred  to  as 
the  cumulative  distribution  function  (c.d.f.)  or,  simply, 
as  the  distribution  function  (d.f.)  of  the  random  vari 
able  X. 

If  X  is  a  discrete  random  variable,  we  will  define 
the  probability  function  (p.f.)  of  the  random  variable 
X  to  be /(#)==  P(.X  ===  re) . 

If  X  is  a  continuous  random  variable,  we  will  define 
the  continuous  probability  density  function  of  the  ran 
dom  variable  X  to  be  f(x)  —  dF(x)/dx. 
For  a  discrete  random  variable  X,  F(x}  =  ]Cy** /(?/)• 
NOTE:  If  F(x)  is  defined,  f(x)  may  be  obtained  by 
differencing.  However,  close  attention  must  be  given 
to  equality  and  inequality  signs. 
For  a  continuous  random  variable  X, 


F(*)  -    f  * f(y)dy. 

*/    —00 

Theorem  3.14       F(x)  has  the  following  properties: 


(1) 

(2) 
(3) 


if 


Theorem  3.15 


/(a?)  has  the  following  properties: 

(1)  /te)^0. 

(2)  2^  /(#)  =  1  if  X  is  a  discrete  random  variable, 

alia:  ? 


or 


f(x)dx  =  1 


Theorem  3.16 


if  X  is  a  continuous  random  variable. 
For  a  discrete  random  variable  X, 


P(a  <  X  <  6) 


-F(a)  = 
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Theorem  3.17       For  a  continuous  random  variable  X, 

P(a  <  X  <  6)  =  F(b)  -  F(a)  -    f   f(x)dx. 

J  a 

Definition  3.16  For  two  random  variables  X  and  Y,  we  will  denote 
P(X<x,  Y<y)  by  F(x,  y).  Further,  F(x,  y)  will  be 
referred  to  as  the  joint  cumulative  distribution  function 
of  X  and  F, 

Definition.  3.17  If  X  and  Y  are  discrete  random  variables,  the  joint 
probability  function  of  X  and  Y  will  be  denoted  by 


Definition  3,18  If  -X"  and  Y  are  continuous  random,  variables,  the  joint 
probability  density  function  of  X  and  Y  will  be  de 
noted  by 


dxdy 
Theorem  3.18       For  discrete  random  variables  X  and  F, 


Theorem  3.19       For  continuous  random  variables  X  and  F, 

f**         /-i/ 

F(x,  y)  ==    I       ds   I       f(sj  t)dt. 


Theorem  3.20 


Theorem  3.21 


F(x,  T/)  has  the  following  properties: 

111      ft  \    •"""••   OO       ?/ )   EST  ft  ( *JT       — '    CO   )  ssss  /'[   •—   CO      •"-•    OO   )   ?sa  O 
v^.*/*^  ;^//          *     V**^?  /          *\  ?  /          v'* 

/O\       t/T/  -^s         *>*.  N  »—  1 
(^;      /*  (,  00  ;      OO;  as  1. 

(ti)  F(c&,  ?/)  ^^O/)?  which  in  the  marginal  cumu 
lative  distribution  function  of  F, 

(4)  F(*r,  <^o)  — /'\(a;)?  which  in  the  marginal  cutnula~ 
tivr,  distribution  function  of  A*"* 

^C^j  ?/)  h*1^  the  following  properties: 
(1) 


(2) 


-   1      or 


Kit    X      tUt    |/ 


n  whether  A"  and  F  are  discrete  or 
Theorem  3*22       If  X  and  F  arc  discrete,  then 
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P(a  <  X  <  b,  c  <   F  < 


=  F(b,  d)  -  F(b,  c) 
-  F(a,  d)  +  F(a,  c) 


Theorem  3.23 


If  X  and  Y  are  continuous,  then 
P(a  <X<b,c<Y<d)  =  F(b,  d)  — 

-  F(a,  d} 


,  c) 

F(a,  c} 


/b  s*  d 

dx   I      f(oc,  y)dy. 
.j.  +s    n 


Definition  3.19  Associated  with  the  marginal  c.d.f.'s  in  Theorem 
3.20,  we  have  marginal  p.d.f/s  (or  p.f.'s)  denoted  by 
/i(V)  and/2(y),  respectively. 

Definition  3.20  Conditional  p.d.f.'s  (orp.f.'s)  and  c.d.f.'sare  defined  as 
follows  : 

(i) 

(2) 
(3) 
(4) 

Theorem  3.24       If    X    and     Y    are    statistically    independent,    f(&,y} 
=/iO)  -/2(2/)  and  F(x,  y]  =Fx(x)  -F2(?/). 
NOTE:  All  the  definitions  and  theorems  given  for 
two  random  variables  may  easily  be  extended  to  three 
or  more  random  variables. 

EXPECTED  VALUES 

To  aid  in  the  description  of  probability  distributions,  it  is  helpful  to 
know  something  about  their  properties.  Of  special  importance  are  those 
properties  associated  with  the  concept  of  mathematical  expectation. 

The  expected  value  of  any  function  of  a  random  variable  is  defined 
as  the  weighted  average  (weighted  by  the  probability  of  its  occurrence) 
of  the  function  over  all  possible  values  of  the  variable.  Since  expected 
values  are  used  so  much  in  statistics,  a  special  notation  has  been  de 
veloped.  The  symbol  E[  •  -  -  ]  will  be  used  to  denote  the  expected  value 
of  whatever  appears  within  the  brackets.  For  example,  the  expected 
value  of  a  function  B(X}  will  be  denoted  by  E[0(X)]. 


Definition  3.21      &[0(X)]  =    ^    «(«)'/(«),  *  discrete 


all  a; 


/oo 
6(x) 
—  BO 


xy  x  continuous. 


Definition  3.22     E[6(X,  F)]  =    23     X)    #<>>  3>)/O,  y)>  *  and  y  discrete 

all  x     all]/ 
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/OO  X»    OO 

dx   I       Q(x7  y)f(x,  y}dy^  x  and  y  continuous. 
^-00  *J  —00 


Theorem  3.25       For  expected  values,  the  following  properties  hold: 

(1)  E[c]  =c  where  c  is  a  constant. 


(2)   E[c6(X,  Y)]==cE[0(X,  F)]. 

[k  ~~|  X* 

T^  c<Q-(X     Y}       =    y^  c>F\9  (Y    V}1 

/  *  6tt7^^-V  ,      •*    /          —      X    /   C'l^iyA*^  >     •*/.!• 
i-1  J        t_l 

Definition  3.23     The  /cth  moment  with  respect  to  the  origin  of  X  Ls 
denoted  by  A^  —  J^pf*]- 

Definition  3.24     MI  is  known  as  the  mean,  and  it  is  commonly  denoted 
by/*. 

Definition  3.25     The  /cth  moment  about  the  moan,  or  the  A;th  central 
moment,  of  X  is  denoted  by  MA-  =  AT[(^Y  —  /*)*]. 

Definition  3.26     /xa  is  known  as  the  variance,  and  it  is  commonly  de 
noted  by  or2. 

Definition  3.27     The  positive  square  root  of  the  variance,  &,  is  known 
as  the  standard  deviation. 

Theorem  3.26       <r*=*iJi!i~-tA***K[X*]  —  (ft[X])*. 

Definition  3.28     When  dealing  with  two  random  variables,  the  product 
moments  of  X  and    F  are  defined  by  /*Jt,  =»  J$[XrY*]. 

Definition  3.29     The  central  product  moments  are  defined  by 

-Mv>)*i     where     a  v  = /£  LY 1      and 


Definition  3.30     MIA  is  known  as  the  covariance  of  J\T  and  F,  and  it  is 
commonly  denoted  by  <rXK. 

NOTE:  The  variance  of  X,  <r^t  is  sometimes  written 
as  &XX*  Similarly,  ^,  — <rrr*  These  alternative  nota 
tions  show  the  close  relation  between  variances  and 
eovarianeoB. 

Definition  3-31     The  product  moment  correlation  between  -Y  and  F  is 
defined  by  PXY^VXY/VX&Y*  **  should   be  rioted   that 


Theorem  3,27      J 

Theorem  3*28       If  X  and  F  are  statistically  independent, 


OTHER   DESCRIPTIVE   MEASURES 

Definition  3,32  The  value  of  a?  such  that  F(JT)  «p  m  called  the  100^ 
fracttte  of  the  distribution  of  the  random  variable  A". 

Definition  3*33  When  p=»tK5  in  Definition  3.32,  the  corresponding 
vahie  of  x  is  known  as  the  median  of  the  distribution. 

Definition  3*34  The  mode  of  a  distribution  is  thut  value  of  ^  for  which 

is  a  maximum. 
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SPECIAL   PROBABILITY    DISTRIBUTIONS 

Certain  distributions  occur  so  often  in  statistical  problems  that  they 
merit  special  attention.  Some  of  these  are  tabulated  in  Tables  3.1  and 
3.2.  Since  most  applications  involving  these  distributions  require  the 
use  of  probabilities  associated  with  the  distributions,  it  is  convenient 
to  have  available  adequate  tables  of  such  probabilities.  Accordingly, 
tables  for  the  Poisson,  standard  normal,  chi-square,  "Student's"  t,  and 
F  distributions  are  presented  in  Appendices  2  through  6.  Each  of  these 
tables  is  given  in  cumulative  form,  that  is,  in  terms  of  the  cumulative 
distribution  function,  so  that  the  reader  will  have  to  learn  only  one 
method  of  reading  the  tables. 

Problems 

3.1  A  sample  of  3  TV  sets  is  selected  from  a  lot  of  30  sets.  If  there  are  5 
defective  sets  in  the  lot,  what  is  the  probability  the  sample  will  con 
tain  no  defectives?  3  defectives?  1  defective  and  2  nondefectives? 

3.2  A  buyer  will  accept  a  lot  of  10  TV  sets  if  a  sample  of  3,  selected  at 
random,  contains  no  defective  sets.  What  is  the  probability  of  accept 
ing  a  lot  of  10  that  contains  5  defectives? 

3.3  An  electrical  circuit  consists  of  4  switches  in  series.  Assume  that  the 
operations  of  the  4  switches  are  statistically  independent.  If  for  each 
switch  the  probability  of  failure  (i.e.,  remaining  open)  is  0.02,  what  is 
the  probability  of  circuit  failure? 

3.4  Rework  the  preceding  problem  for  the  case  where  the  circuit  consists 
of  4  switches  in  parallel. 

3.5  Defects  are  classified  as  type  A,  B,  or  C,  and  the  following  probabilities 
have  been  determined  from  available  production  data:  P(A)=0.20, 
P(#)=0.16,  jP(C)=0.14,  P(.AB)=0.08,  PC-AC)  =0.05,  P(BC)-0.04, 
and  P(ABC}  =0.02.  What  is  the  probability  that  a  randomly  selected 
item  of  product  will  exhibit  at  least  one  type  of  defect?  If  an  item 
exhibits  at  least  one  type  of  defect,  what  is  the  probability  that  it 
exhibits  both  A  and  B  defects? 

3.6  An  electrical  assembly  consists  of  two  parts  connected  in  series  in  the 
order:  A  followed  by  B.  The  probability  that  part  A  is  defective  is 
0.025  and  the  probability  that  part  B  is  defective  is  0.011.  What  is  the 
probability  of  having  a  defective  assembly?  A  nondefective  assembly? 
An  assembly  that  fails  only  because  part  B  is  defective? 

3.7  Suppose  the  probability  that  a  certain  piece  of  air-borne  electronic 
equipment  will  not  be  in  working  order  after  its  first  flight  is  0.40, 
and  the  probability  of  failure  drops  to  one-half   its  previous  value 
after  each  succeeding  flight.    (Assume  no  repair  and  replacement,) 
What  is  the  probability  the  equipment  will  be  in  working  order  after 
three  flights?  After  four  flights  given  it  has  survived  two  flights? 

3.8  Consider  a  four-engine  aircraft  (two  on  each  wing)  where  the  prob 
ability  of  an  engine  failure  is  0.05.  Assume  that  the  probability  of  one 
engine  failing  is  independent  of  the  behavior  of  the  others.   What  is 
the  probability  of  a  crash  if  the  plane  can  fly  on  any  two  engines? 
If  the  plane  requires  at  least  one  engine  operating  on  each  side  in  order 
to  remain  in  the  air? 
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3.9  Suppose  3  defective  dry  cells  are  mixed  in  with  7  nondefectives,  and 
you  start  testing  them  one  at  a  time.  What  is  the  probability  that  you 
will  find  the  last  defective  on  the  sixth  test? 

3.10  Three  operators  (A,  B,  and  <7)  alternate  in  operating  a  certain  ma 
chine.  The  number  of  parts  produced  by  A,  B,  and  C  are  in  the  ratio 
3:4:3  and,  of  the  parts  produced,  1  per  cent  of  A's,  2  per  cent  of  B's, 
and  5  per  cent  of  C*B  are  defective.  If  a  part  is  drawn  at  random  from 
the  output  of  their  machine,   what  is  the  probability  it  will  be  de 
fective? 

3.11  Referring  to  Problem  3.10,  what  is  the  probability  that,  if  a  defective 
part  is  selected,  it  was  produced  by  A?  by  B?  by  C? 

3.12  Iif(x9  ?/)=exp   {  —  (rc+2/)}  for  x>0,  y  >0,  find: 

(a)    /,(x),    (b)    My},    (c)    Fi(x),    (d)    F*(y}9    (e)    f(y\x),    (f)    f(x\y), 
(g)  F(x,  2/),  (h)  F(y\x),  (i)  F(x\y).  . 

3.13  If /(a,  $/)  =3z  for  0  <y  <rc,  0  <x  <1,  find  the  same  functions  as  asked 
for  in  the  preceding  problem. 

3.14  If  f(x9  y)  =24?/(l  —  x  —  y}  over  the  triangle  bounded  by  the  axes  and 
the  line  x  +  y  =  1,  find  the  same  functions  as  asked  for  in  Problem  3.12. 

3.15  During  the  course  of  a  day,  a  machine  turns  out  either  0,   1,  or  2 
defective  items  with  probabilities  |,  f,  and  £,  respectively.  Calculate 
the  mean  and  variance. 

3.16  Given  that  the  number  of  accidents  occurring  at  a  particular  inter 
section  between  10:00  P.M.  and  midnight  on  Saturday  is  0,  1,  2,  3,  or  4 
with  probabilities  0.90,  0.04,  0.03,  0.02,  0.01,  respectively,  determine 
the  expected  number  of  accidents. 

3.17  Suppose  that  the  life  in  hours  of  a  certain  type  of  tube  has  the  p.d.f. 
/(#)  =  a/x*}  #>500,   and /0*0  =0,   x  <5QQ.  Find  the  c.d.f.   Determine 
the  mean  and  variance.  What  is  the  probability  a  tube  will  last  at 
least  1,000  hours? 

3.18  A  submarine  carries  three  missiles.  Assuming  the  only  error  is  in  one 
direction  (e.g.,  a  range  error  but  no  sideways  error)  and  that  a  hit 
within  40  miles  of  the  target  is  considered  a  success,   compute  the 
probability  of  a  successful  operation  (i.e.,  an  operation  in  which  at 
least  one  hit  is  a  success)  if  all  three  missiles  are  launched  and  the 
error  p.d.f.  is: 

/(#)  =  (lOO+aO/10,000  - 100  <x  <0 

«(100  — aO/10,000  0<x<100 

=  0  elsewhere. 

3.19  Referring  to  the  previous  problem,   the  submarine  can  carry  eight 
missiles  of  a  smaller  sisse.  However,  in  this  case  a  hit  must  be  within 
15  miles  to  be  successful.  Assuming  the  same  p.d.f.,  should  the  light 
or  heavy  missiles  be  used? 

3.20  A  service  station  will  be  supplied  with  gasoline  once  a  week.  Its  weekly 
volume  of  sales  in  thousands  of  gallons  is  predicted  by  the  p.d.f. 
f(x')  =  5(1  — x)4  for  0  <x  <1.  Determine  what  the  capacity  of  its  under 
ground   tank  should   be  if  the   probability  that  its   supply   will  be 
exhausted  in  a  given  week  is  to  be  0.01. 

3.21  Show  that  the  correlation  between  two  random  variables  is  0  if  they 
are  statistically  independent. 

3.22  Let  X  have  the  marginal  density  /i(#)  =1  for  —  %  <x  <£,  and  let  the 
conditional  density  of  Y  given  X  be 
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=  1;   —  x  <y  <1  —  x,  0  <x 
=  0;  elsewhere. 

Find  the  correlation,  between  X  and  F.  Discuss  the  relationship  be 
tween  eon-elation,  and  statistical  independence. 

3.23  A  process  is  producing  parts  that  are,  on  the  average,  1  per  cent  defec 
tive.   Ten   parts   are  selected   at  random   from   the   process  and   the 
process  is  stopped  if  one  or  more  of  the  ten  are  defective.  What  is  the 
probability  that  the  process  will  be  stopped? 

3.24  In  inspecting  1,000  welded  joints  performed  by  a  certain  welder  using 
n  specific  process,  150  defective  joints  were  discovered.  If  the  welder 
is  about  to  weld  5  joints,  what  is  the  probability  of  getting  no  defective 
joints?  of  one?  of  two?  of  two  or  more?  Discuss  any  assumptions  you 
make  in  solving  this  problem. 

3.25  A  large  number  of  rivets  is  used  in  assembling  an  airplane.    It  has 
been  determined  that  the  probability  distribution  for  the  number  of 
defective  rivets  is  Poisson  with  X  =  2.  Find  the  probability  that  the 
number  of  defective  rivets  in  a  plane  will  be  no  more  than  two. 

3.26  Suppose  there  is  an  average  of  1  typographical  error  per  10  pages  in 
a  certain  book.  What  is  the  probability  that  a  30-page  chapter  will 
contain  no  errors? 

3.27  A  telephone  vswitchboard  handles,  on  the  average,  600  calls  during  the 
rush  hour.  The  board  eaix  make  a  maximum  of  20  connections  per 
minute.  What  is  the  probability  the  board  will  be  overtaxed  in  any 
given  minute  during  the  rush  hour? 

3.28  Assuming  a  normal  distribution,  find: 

(a)  P(  —  3  <  Y  <  —  1) ;  given  ;x  =  0;  tr  =«=  1 . 

(6)  7>(  — 3<F<0.r>);  given  M»0,  <r«l. 

(C)  p(  —  8<F  <0);  Rivon/A  — 2,  <r*^4 

02)  P(4<F<50);  given  M«*  —0.1 ,  <r*«4 

O)  J^FSrS);  Rivon/A-0,  <ra«l 

CO  /*  ( Y  <  -  3)  ;  gi  ven  M  «  2,  cr*  -  4. 

3.29  AnBuming  a  clu-nquare  (liHtributum,  find: 

c«) 


)  for  v«lf> 
(r)    P(23.8  <xs  <3(K4)  for  v  «  24 
(d)  P(x 


3.30     ARBuminp  a  ^-cliHtri{)utu>n,  find: 
(a)   />C|«|  >  2.01  5)  for  p-5 
(6)    P(«>2X)15)  for  ^-5 
(c)    ;>(  —  1.341  «<2.  121)  for  p«1 
(rf)   /*(^<l.r>)  for  »«20. 

3*31      AHHuming  an  /^-diHtribution,  find: 

for  Vi-11,  vt*tt 


(r)    W  >  7,79)  for  ^  «  «,  va  -  11 
(rf)   7*  (0.221  </^S2.62)  for  n«6, 
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3.32  The  finished  diameter  on  armored  electric  cable  is  normally  distributed 
with  mean  0.77  inch  and  standard  deviation  0.01  inch.  What  is  the 
probability  the   diameter  will  exceed  0.795  inch?   If  the  engineering 
specifications  are  0.78  ±0.02  inch,  what  is  the  probability  of  a  defec 
tive  piece  of  cable? 

3.33  If  the  p.d.f.  for  the  life  of  a  certain  type  of  component  is  /(#)  =  (1/100) 
exp    { — rc/lOO}    for  x  >0,   what  is   the   probability  that  a  randomly 
selected  component  will  last  400  hours?  That  it  will  last  400   hours 
given  that  it  has  already  survived  200  hours?   If  an  assembly  uses 
three  of  these  components   in  series,   what  is  the  probability  that  an 
assembly  incorporating  three  randomly  selected  components  will  not 
fail  because  of  component  failure? 

3.34  The  hazardrate  is  defined  as /(#)/{  1 —F  (or)  Mf/O)  =  (1/0)  exp  {  —  x/6} 
for  x  >0,  what  is  the  hazard  rate? 
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CH APTE  R    4 

ELEMENTS  OF  SAMPLING  AND 
DESCRIPTIVE  STATISTICS 

IN  THIS  CHAPTER  we  shall  discuss  the  basic  ideas  of  sampling  and  the 
presentation  of  sample  data.  Certain  useful  statistics  of  a  summarizing 
nature  will  be  defined  and  efficient  methods  of  calculation  outlined. 

To  begin  the  discussion  of  sampling,  the  reason  for  taking  samples 
should  be  mentioned.  The  reason  is  usxially  one  of  the  following: 
(1)  Due  to  limitations  of  time,  money,  or  personnel,  it  is  impossible  to 
study  every  item  in  the  population;  (2)  the  population,  as  defined,  may 
not  physically  exist;  (3)  to  examine  an  item  may  require  that  the  item 
be  destroyed. 

Before  proceeding  to  the  actual  mechanism  of  obtaining  samples  and 
the  analysing  of  data  therefrom,  it  will  be  wise  to  define  some  terms 
frequently  encountered* 

4.1      THE  POPULATION  AND  THE  SAMPLE 

In  statistical  work  it  is  important  to  know  whether  we  are  dealing 
with  a  complete  population  of  observations  or  with  a  sample  of  ob 
servations  selected  from  a  specified  population. 

A  population  is  defined  as  the  totality  of  all  possible  values  (measure 
ments  or  counts)  of  a  particular  characteristic  for  a  specified  group  of 
objects.  Such  a  specified  group  of  objects  is  called  a  universe.  Obvkmsly 
a  universe  can  have  several  populations  associated  with  it.  Some  exam 
ples  of  universes  and  populations  are : 

(1)  The  employees  of  Arizona  State  University  as  of  5:00  P.M.  on 
December  4,  1902. 

(2)  Associated  with  the  preceding  universe  are  many  populations, 
for  example,  the  population  of  blood  types,  the  population  of 
weights,  the  population  of  heights,  etc, 

(3)  The  univerae  of  all  single-dwelling  units  in  Tempe,  Arizona,  on 
December  31,  1902, 

(4)  Associated  with  thin  universe  of  single-dwelling  tmits  are  such 
populations  as  the  number  of  rooms  per  unit,  the  number  of 
people  residing  in  each  unit,  and  so  on. 

(5)  A  universe  may  contain  only  one  object,  such  an  a  piece  of 
steel  pipe,  and  the  population  consists  of  all  possible  measure 
ments  of  its  inside  diameter. 

(6)  A  universe  might  consist  of  all  vacuum  tubes  of  a  specific  type 
manufactured  by  a  given  manufacturer  under  similar  condi 
tions, 
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(7)    Populations    associated    with    the    preceding    universe    are: 
lengths  of  life,  function  on  test,  etc. 

These  examples  should  suffice  to  impress  upon  the  reader  the  impor 
tance  of  clearly  defining  the  population  under  investigation. 

The  concept  of  a  sample,  as  opposed  to  a  population,  is  very  im 
portant.  A  sample  is  just  a  part  of  a  population  selected  according  to 
some  rule  or  plan.  The  important  things  to  know  are:  (1)  that  we  are 
dealing  with  a  sample  and  (2)  which  population  has  been  sampled. 

If  we  are  dealing  with  the  entire  population,  our  statistical  work  will 
be  primarily  descriptive.  On  the  other  hand,  if  we  are  dealing  with  a 
sample,  the  statistical  work  will  not  only  describe  the  sample  but  also 
provide  information  about  the  sampled  population. 

4.2     TYPES  OF  SAMPLES 

There  are  several  types  or  classes  of  samples  encountered  in  practice. 
The  characteristics  which  distinguish  one  type  from  another  are :  (1)  the 
manner  in  which  the  sample  was  obtained,  (2)  the  number  of  variables 
recorded,  and  (3)  the  purpose  for  which  the  sample  was  drawn.  The 
last  two  characteristics  listed  are  easily  understood  in  any  practical 
situation  although  No.  3  is  frequently  not  clearly  stated  and  perhaps 
even  forgotten.  The  manner  of  obtaining  the  sample  is  very  important 
and  will  be  discussed  further. 

Samples  may  be  grouped  into  two  broad  classes  when  their  method 
of  selection  is  considered,  namely,  those  which  are  selected  by  judg 
ment  and  those  which  are  selected  according  to  some  chance  mecha 
nism.  Samples  selected  according  to  some  chance  mechanism  are 
known  as  probability  samples  if  every  item  in  the  population  has  a 
known  probability  of  being  in  the  sample.  In  particular,  if  each  item  in 
the  population  has  an  equal  chance  of  occurring  in  the  sample,  then  the 
sample  is  known  as  a  random  sample. 

Why  are  random  samples  preferred  to  subjectively  selected  samples? 
An  answer  to  this  question  may  be  formulated  as  follows:  A  good 
sample  is  one  from,  which  generalizations  to  the  population  can  be 
made ;  a  bad  sample  is  one  from  which  they  cannot  be  made.  To  general 
ize  from  a  sample  to  a  population,  we  need  to  be  able  to  deduce  from 
any  assumptions  about  the  population  whether  the  observed  sample  is 
within  the  range  of  sampling  variation  that  might  occur  for  that  popu 
lation  under  the  given  method  of  sampling.  Such  deductions  can  be 
made  if,  and  only  if,  the  laws  of  mathematical  probability  apply.  The 
purpose  of  randomness  is  to  insure  that  these  laws  do  apply.  If  we  had 
equally  well-established  and  stable  laws  of  personal  bias,  subjective 
sampling  could  be  used. 

We  can  sample  from  different  populations  in  various  ways: 

(1)   A  random  sample  may  be  drawn  from  a  population  specified 
by  a  continuous  probability  density  function.  In  this  case,  the 
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question  of  sampling  with  or  without  replacement  does  not 
arise* 

(2)  A  random  sample  may  be  drawn  from  an  infinite  population 
specified  by  a  discrete  probability  density  function.  Again,  the 
question  of  with  or  without  replacement  does  not  arise. 

(3)  A  random  sample  may  be  drawn   from  11  finite   population 
(specified  by  a  discrete  probability  density  function)  where  the 
sampling  is   performed   with  replacement.   Sampling  with  re 
placement  effectively  makes  the  population  infinite. 

(4)  If  sampling  from  a  finite  population  is  performed  without  re 
placement,   we  no  longer  have  a  random  sample  as  defined 
earlier.  Sometimes,  a  "random"  sample  for  this  situation  is  de 
fined  as  one  in  which  each  set  of  n  objects  has  an  equal  chance 
of  being  the  sample  of  size  n. 

Other  types  of  samples  of  a  specialized  type  are  sometimes  en 
countered.  Two  of  these  are: 

Stratified  Random  Sample 

The  population  is  first  subdivided  into  subpopulations  or  strata. 
Then  a  simple  i^andom  sample  is  drawn  from  each  stratum. 
Systematic*  Random  Sample 

Consider  the  N  units  in  the  population  to  be  arranged  in  some 
order.  If  a  sample  of  sixe  n  is  required,  take  a  unit  at  random  from 
the  fir«t  k^N/n  units  and  then  take  every  fcth  unit  thereafter. 

Having  defined  the  various  types  of  sampling  frequently  encountered, 
the  following  caution  is  noted:  The  methods  of  analysis  will  not  be  the 
same  for  each  type  of  sampling,  (treat  care  must  bo  exereised  to  use  the 
proper  method  of  analysis;  fuihtrc  to  do  HO  oan  load  to  serious  errors  in 
judgment  when  the  decision-making  stage  is  reached. 

4,3    SAMPLING;  FROM  A  SPECIFIED  POPULATION 

How  do  we  go  about  .selecting  a  sample  from  a  specified  population? 
Some  examples  will  serve  as  explanation:  (I)  Suppose  the  population 
consists  of  only  two  values*.  One  of  them  can  be  selected  at  random  by 
tossing  ati  uubiaHcd  coin.  (2)  Consider  a  population  consisting  of  100 
items.  One  hundred  numbered  tickets  (corresponding  to  our  population 
of  items)  can  be  placed  in  a  howl  and  tickets  selected  in  a  chance 
manner.  (3)  In  the  previous  example,  the  sample  values  could  have  been 
selected  using  a  table  of  random  numbers. 

To  Illustrate  the  use  of  a  table  of  random  numbers,  consider  the 
problem  of  obtaining  a  sample  of  n**r*  batteries  from  a  lot  of  A^  — 25, 
First,  number  the  batteries:  01,  02,  •  •  •  ,  25.  Second,  refer  to  a  table 
of  random  numbers  such  an  given  in  Appendix  7  and  proceed  through 
the  following  steps. 

(t)   Select  by  any  method  one  of  the  four  pages  of  tabled  values. 

(2)    Without  direction,  bring  a  pencil  point  down  on  the  printed 
page  BO  an  to  hit  a  random  digit. 
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(3)  Read  this  digit  and  the  next  three  to  the  right,  for  example, 
2167. 

(4)  Let  the  first  two  of  these  specify  the  row  and  the  last  two  the 
column. 

(5)  Go  to  this  point  in  the  table  of  random  numbers  and  read  the 
specified  digit  and  the  next  one  to  the  right.  This  reads  73. 
However,  the  only  possible  numbers  of  use  in  the  specified 
problem  are  01,  02,  -  •  -  ,25.  Thus,  it  is  necessary  to  run  down 
the  column  until  five  suitable  numbers  are  observed.  In  order 
the  numbers  observed  are  73,  48,  54,  01,  18,  38,  60,  70,  44, 
30,  41,  86,  23,  64,  31,  71,  68,  64,  13,  12.  The  numbers  specifying 
the   five   batteries  to   be  included  in  the  sample   have  been 
underscored. 

(6)  Appropriate  changes  should  be  made  in  step  No.  5  to  handle 
different  problems. 

4.4      PRESENTATION    OF   DATA 

Having  obtained  a  random  sample  from  a  specified  population,  some 
way  of  reducing  it  to  an  understandable  form  is  called  for.  To  illustrate 
the  usual  techniques  for  presenting  such  data,  consider  the  data  in 
Table  4.1. 

In  this  form  the  data  are,  to  say  the  least,  confusing.  It  is  not  easy  to 
visualize  any  pattern  in  the  observed  values,  nor  is  it  easy  to  estimate 
the  average  function  time.  We  find  it  convenient,  therefore,  to  arrange 
the  values  in  a  frequency  distribution  as  in  Table  4.3.  To  accomplish 
this,  we  first  make  use  of  a  tally  sheet  as  shown  in  Table  4.2.  Inciden 
tally,  Table  4.3  provides  us  with  an  array,  that  is,  the  values  arranged 
in  order  of  magnitude. 

Upon  examination  of  Table  4.3,  we  note  that  all  the  observations  are 
greater  than  or  equal  to  59  milliseconds  and  less  than  or  equal  to  70.5 
milliseconds.  That  is,  we  have  established  the  range  of  our  data.  Fur 
ther,  we  can  roughly  estimate  the  average  function  time  to  be  65  milli 
seconds. 

However,  since  it  takes  too  long  to  scan  all  the  values  in  Table  4.3, 
the  data  are  still  in  rather  cumbersome  form.  To  remedy  this,  it  is 
customary  to  condense  the  data  even  more  by  tabulating  only  the  fre 
quencies  associated  with  certain  intervals,  usually  referred  to  as  class 
intervals.  To  set  up  class  intervals,  a  good  working  rule  is  to  have  no 
fewer  than  5  and  no  more  than  15  intervals.  Also,  the  limits  of  the  class 
intervals  should  be  chosen  so  that  ther^  is  no  ambiguity  in  assigning 
observed  values  to  the  classes.  This  latter  requirement  is  most  easily 
satisfied  by:  (1)  selecting  class  limits  which  carry  one  more  decimal 
place  than  the  original  data,  or  (2)  proper  use  of  inequality  and  equality 
signs.  We  shall  adopt  the  second  of  these  two  procedures  in  this  text. 
Using  class  intervals  of  width  1  millisecond,  we  get  the  data  in  the  form 
of  Table  4.4.  In  this  table  we  have  used  the  letter  X  to  represent  the 
various  function  times  in  milliseconds.  To  interpret  the  values  and 
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TABLE  4,1-Function  Times  of  201  Explosive  Actuators 

Measured  in  Milliseconds 

(Hypothetical  Data) 


64.0 

61,5 

69.0 

65.25 

69.0 

66.0 

63.5 

65.25 

66.25 

67.25 

67.25 

62.5 

61.75 

63,5 

63.75 

66.5 

66.0 

65.5 

65.25 

66.5 

64,5 

67.75 

64.5 

68,0 

63  .  75 

68.0 

70.5 

68.0 

65.0 

62.0 

62  .  75 

61.5 

60.0 

65.75 

66.0 

62.0 

65  .  75 

60.75 

63.75 

62,0 

70.25 

64.75 

68.5 

65.0 

66,5 

64.0 

67.0 

67.0 

63.0 

64.0 

67.0 

63.25 

65.25 

67.5 

65,0 

67.5 

64.5 

68.0 

63  .  5 

68,75 

63.0 

66.25 

67.0 

65.25 

64,0 

65.25 

63.0 

67.0 

65,5 

62.0 

64.5 

66.25 

65.0 

63.75 

67.5 

65.5 

64.75 

67.0 

68.0 

59.0 

64.5 

67.0 

67,75 

63.25 

63.25 

65.5 

64.0 

67.0 

64.5 

67,5 

65,0 

61,0 

64.5 

63.0 

66.5 

66.0 

65.0 

61.25 

69.5 

64,0 

68,0 

64.5 

66.5 

64,25 

65.0 

62.25 

63.5 

63.0 

67,0 

65  .  25 

65,0 

65  .  0 

65.25 

65.25 

63,0 

65  .  5 

65,0 

62,0 

64,0 

62.5 

64,75 

61.5 

62  .  75 

68.5 

63.5 

63,0 

64,5 

67,0 

61.75 

66.25 

64.75 

65.5 

62.75 

68,5 

61,5 

63,0 

65,5 

65.5 

63.0 

65.5 

66.75 

69.5 

65.25 

63  .  5 

66.0 

62.25 

62,5 

61.5 

68,0 

63  .  75 

66.0 

64.0 

67,0 

67.75 

65.25 

67.75 

68.0 

63.5 

63,25 

63,0 

61.75 

69,0 

65.0 

62.5 

62,0 

64,75 

64.0 

66.75 

66,0 

64.5 

64.25 

62.5 

66.5 

66,75 

64.5 

60.0 

65,0 

66.0 

64.5 

66.25 

65*75 

65.5 

64,5 

62.0 

65,25 

64.25 

63.0 

64.0 

66.75 

65  .  25 

63,75 

67.0 

61.0 

70.0 

70.0 

65  ,  5 

65  „  25 

64.5 

67,5 

65.75 

70.0 

frequencies,  we  proceed  as  follows:  One  actuator  had  a  function  time 
of  more  than  58  milliseconds  but  less  than  or  equal  to  59  miHinoconda; 
two  actuators  had  a  function  time  of  more  than  59  miIli«econdB  tmt  le«B 
than  or  equal  to  00  milliseconds;  ami  so  on.  Pleane  note  that  we  have 
less  information  available  in  Table  4,4  than  in  Table  4.3,  This  IB  be 
cause  we  no  longer  know  the  individual  values  but  only  in  which  clans 
interval  they  fall.  But  the  lo«a  in  accuracy  is  balanced  to  &ome  extent 
by  the  gain  in  conciseness.  The  column  headed  "relative  frequency" 
tella  us  what  proportion  of  the  total  observations  fall  in  each  class.  The 
valuen  are  foxmd  by  dividing  each  elans  frequency  by  the  total  fre 
quency. 
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Conforming  to  the  adage  that  a  "picture  is  worth  ten  thousand 
words/ '  we  often  represent  our  distribution  by  a  chart  or  frequency 
histogram.  This  is  illustrated  by  Figure  4.1.  The  dotted  line  pictures  a 
frequency  polygon.  Note  that  the  frequency  histogram  is  formed  by 
erecting  rectangles  over  the  class  intervals,  the  height  of  each  rectangle 
agreeing  with  the  class  frequency  if  the  left-hand  scale  is  read,  and  with 
the  class  relative  frequency  if  the  right-hand  scale  is  read.  The  fre 
quency  polygon  is  formed  by  joining  the  midpoints  at  the  tops  of  the 
rectangles. 

It  is  also  to  be  noted  that  the  frequency  histogram  and  polygon,  as 
well  as  the  frequency  distribution,  give  us  not  only  an  estimate  of  the 
average  value  but  also  an  idea  of  the  amount  of  variability  present  in 
the  data. 

Another  convenient  way  of  tabulating  data  is  to  prepare  a  cumu 
lative  frequency  distribution  showing  the  number  of  observations  less 
than  or  equal  to  a  specified  value.  The  figures  are  obtained  by  adding,  in 
cumulative  fashion,  the  frequencies  recorded  in  Table  4.4.  This  is 
illustrated  in  Table  4.5.  The  graph  which  arises  from  this  table  is  shown 

TABLB  4.2-Tally  Sheet  for  Data  of  Table  4.1 
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TABLE  4.3-Frequency  Distribution  for  Data  of  Table  4.1 


Number  of  Actuators 

Function  Time 

Exhibiting  Given  Function 

Relative  Frequency 

(MS) 

Time  =  Frequency  (f) 

(r.  f.) 

59.0 

1 

0.005 

60.0 

2 

0.010 

60.75 

1 

0.005 

61.0 

2 

0.010 

61.25 

1 

0,005 

61.5 

5 

0,025 

61.75 

3 

0.015 

62.0 

7 

0.035 

62  .  25 

2 

0.010 

62.5 

5 

0.02S 

62  .  75 

3 

0.015 

63.0 

11 

0.055 

63.25 

4 

0.020 

63.5 

7 

0.035 

63  .  75 

6 

0  ,  030 

64.0 

10 

0.050 

64.25 

3 

0.015 

64.5 

14 

0.070 
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5 

0,025 

65.0 

12 
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65.25 

14 
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11 
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0.020 
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12 
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67.25 

2 

0.010 

67.5 

5 

0,025 

67  .  75 

4 

0,020 

68.0 

8 

0.040 

68  .  5 

3 

0,015 

68.75 

1 

0.005 

69.0 

3 

0.015 

69.5 

2 

0,010 

70.0 

3 

0.015 

70.25 

1 

0.005 

70.5 

1 

0.005 

Totals 

201 

1.005* 

Total  exceeds  1 .000  because  of  errors  of  rounding* 
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TABLE  4.4-Frequency  Distribution  (Using  Class  Intervals)  for  Data 

of  Table  4.1 


Function  Time 
(MS) 


Number  of  Actuators 
With  Function  Time 

In  Specified  Class 
Interval  =  Frequency  (f) 


Relative  Frequency 
(r.  f.) 


58  <  X  <  59 

1 

0.005 

59  <  X  <  60 

2 

0.010 

60  <  X  <  61 

3 

0.015 

61  <  X  <  62 

16 

0.080 

62  <  X  <  63 

21 

0.104 

63  <  X  <  64 

27 

0.134 

64  <  X  <  65 

34 
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37 
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27 

0.134  , 

67  <  X  <  68 

19 
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68  <  X  <  69 
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70  <  X  <  71 
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Totals 

201 
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FIG.   4.1— Frequency  histogram   and   polygon   plotted  from   Table  4.4. 
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TABLE  4.5-Cumulative  Frequency  Distribution  Formed  From  Table  4.4 


Function  Time 
(-30 


Number  of  Actuators  With 

Function  Time  Less  Than 

or  Kqual  to  the  Specified 

Value  ==  Cumulative 

Frequency  (c.f.) 


Relative  Cumulative 
Frequency  (r.c.f.) 


58 

0 

0.000 

59 

1 

0.005 

60 

3 

0.015 

61 

6 

0.030 

62 

22 

0.109 

63 

43 

0.214 

64 

70 

0.348 

65 

104 

0.517 

66 

141 

0.701 

67 

168 

0.836 

*  68 

187 

0.930 

69 

194 

0.965 

70 

199 

0  .  990 

71 

201 

1  ,000 

in  Figure  4.2  and  is  quite  helpful  in  interpreting  the  observed  data. 
Note  the  cumulative  (ogive)  curve  is  plotted  by  joining  the  right-hand 
endpoints  at  the  tops  of  the  rectangles,  This  curve  (see  clotted  line)  in 
formed  as  just  mentioned  because  it  represents  the  cumulative  fre 
quency  up  to  and  including  the  upper  clans  limit. 

4.5     CALCULATION   OF  SAMPLE  STATISTICS 

If  a  satnple  is  to  be  described  in  any  reasonable  manner,  it  m  desirable 
to  calculate  certain  representative  values  which  {summarize  a  great  deal 
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FIG.  4.2— Cumulative  frequency  histogram  and  polygon 
plotted  from  Table  4*5* 


4.6       THE    ARITHMETIC    MEAN  53 

of  information.  Not  all  of  the  representative  values  to  be  described  in 
the  following  pages  are  of  equal  importance.  However,  we  have  gone 
into  considerable  detail  in  defining  them  all  so  that  the  reader  will  be 
aware  of  their  existence,  uses,  advantages,  and  disadvantages. 

4.6      THE  ARITHMETIC    MEAN 

It  is  not  surprising  that  the  ordinary  arithmetic  mean  is  the  most 
common  of  these  representative  values.  The  sample  mean,  denoted  by 
X  ,  is  defined  as  the  arithmetic  average  of  all  the  values  in  the  sample. 
The  formula  for  calculating  the  sample  mean  is 

X  «   (X,  +   -  -   -  +  Xn)/n  =   X  Xt/n  =   Z)  X/n  (4.1) 

i=l 

where  there  are  n  observations  in  the  sample, 

Example  4.1 

Given  the  sample  values  3,   4,    —2,    1,   and  4,    calculate  the   mean. 


The  above  example  illustrates  the  method  of  computing  the  arith 
metic  mean.  It  is  to  be  noted  that  the  arithmetic  mean  is  affected  by 
every  item  in  the  sample  and  is  greatly  affected  by  extreme  values. 
Two  interesting  properties  of  the  arithmetic  mean  are:  (1)  the  sum  of 
the  deviations  from  the  arithmetic  mean  is  zero,  and  (2)  the  sum  of  the 
squares  of  the  deviations  from  the  arithmetic  mean  is  less  than  the  sum 
of  the  squares  of  the  deviations  from  any  other  value. 

As  might  be  expected,  the  arithmetic  mean  has  both  advantages  and 
disadvantages.  Its  advantages  are:  (1)  it  is  the  most  commonly  used 
average,  (2)  it  is  easy  to  compute,  (3)  it  is  easily  understood,  and  (4)  it 
lends  itself  to  algebraic  manipulation.  The  one  major  disadvantage  of 
the  arithmetic  mean  is  that  it  is  unduly  affected  by  extreme  values  and 
may  therefore  be  far  from  representative  of  the  sample. 

Before  proceeding  to  a  second  representative  measure  for  describing 
samples,  it  will  pay  us  to  look  at  methods  of  calculating  the  arithmetic 
mean  when  our  data  are  in  the  form  of  a  frequency  distribution.  If  for 
each  different  value  of  X  we  have  a  frequency/,  then  the  sample  mean 
is  given  by 

2-  =  ' 


/I+/2+     •     •     *    +fd 


n 
a—i 
where  there  are  d  different  values  of  X. 
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Example  4.2 

_  The  data  in  Table  4.3  are  of  the  typo  just  described.  Then 
X  =  {(1)(59.0)  +  (2)  (60.0)  +  -  -  -  +  (1)(70.5)  }/20l  =  13,071.75/201 
=  65.034  milliseconds, 

Many  times  oxir  data  appear  in  frequency  tables  where  we  no  longer 
know  the  actual  values  of  the  observations  but  only  to  which  class  in 
terval  they  belong.  In  these  instances,  the  best  we  can  do  is  to  approxi 
mate  the  sample  mean.  To  obtain  this  approximation  we  assume  that 
the  values  in  a  particular  class  interval  are  xmiformly  distributed  over 
the  interval.1  This  permits  us  to  use  the  midpoint  for  each  observation 
in  the  interval  when  calculating  the  mean.  Thus,  if  we  denote  the  mid 
point  of  the  tth  interval  by  £»*,  and  there  are  k  intervals,  the  sample 
mean  is  approximately 


"Vs   r^    ^^l    +*     '     •     •     +  /&£&      ^        .-,  ^      A.^^^  >.      «v 

A  =  -— -—      -  —  —--—-—.-  _  _-,-.._-_-_  _  ^-  ^4.o) 

y  i  -j-  -  *  •  +  jk  n  n 

Example  4.3 

Considering   tho   data   of   Table   4,4,    it  is  seen   that   3fs  { (I)(58.5) 
+  (2)  (59.rO  +    —  -    +  (2)(70.f>)  }/201    -  13,032.5/201   -  «4.«38  milli- 


Short-cut  methods  of  ealexilation  for  use  when  machines  are  not 
available  are  summarized  in  liquations  4.2a  and  4.tta: 


(w)  (4  ,  3a) 

n 

where  JTo  and  fo  are  arbitrary  origins,  #  —  A*"  —  X^  z"=  ({•—  fu)/«»,  and 
t(j  is  thci  width  of  a  class  intorvah 

Example  4.4 

-Y  /  ^  fZ 


to 

3 

_20 

20 

5 

—  10 

30 

8 

0 

40 

2 

10 

la 

_  ?,° 

-90 
In  the  above  table,  A'o8"^.  Therefore 


writorn  anHXtmt^  that  all  the  valuon  in  an  intihrval  are  concentrated  at 
the  midpoint. 
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Example  4.5 

Class  Interval  f  £  i  fi 


$<X<\.5 

3 

10 

—  2 

—  6 

15<X<25 

5 

20 

—  1 

—  5 

25<X<35 

8 

30 

0 

0 

35<X<45 

2 

40 

1 

2 

18 

—  9 

Thus  ^^30+ (  —  9/18)  (10)  =25. 

4.7  THE   MI  ORANGE 

Another  representative  value  of  importance,  especially  when  a  quick 
average  is  needed,  is  the  midrange.  The  midrange  is  defined  as 

^ -^-min    ~T"    -^-max  ,  ^       ,. 

MR  =  (4.4) 

where  X^^  is  the  smallest  (minimum)  sample  value  and  Xmax  is  the 
largest  (maximum)  sample  value.  It  must  be  realized  that  even  though 
the  midrange  is  quick  and  easy  to  compute,  it  is  often  inefficient  be 
cause  all  information  contained  in  the  intermediate  values  has  been 
ignored.  Also  it  can  be  quite  unrepresentative  if  either  the  smallest  or 
largest  value  is  decidedly  atypical  of  all  the  data. 

4.8  THE   MEDIAN 

A  representative  value  frequently  employed  as  an  aid  in  describing  a 
set  of  data  is  the  median.  The  sample  median,  denoted  by  M,  is  the 
[(n+l)/2]th  observation  when  the  values  are  arrayed  in  order  of  mag 
nitude.  Theoretically,  one-half  the  observations  should  have  a  value 
less  than  the  median  and  one-half  the  observations  should  have  a  value 
greater  than  the  median.  However,  in  practice  it  does  not  always  work 
out  quite  this  way  due  to  clustering  of  the  observations  (see  Example 
4.6).  Regardless,  the  median  is  important  as  a  measure  of  positioner 
location. 

Example  4.6 

If  we  consider  the  data  of  Table  4.3  where  n  =  201,  the  median  is  the 
(201  +  l)/2  =  101st  item  in  the  array.  Counting  down  the  frequencies  in 
Table  4.3  we  find  the  101st  item  to  be  65.  Thus  M  =  65  milliseconds. 

Example  4.7 

Given  the  sample  (2,  3,  4,  6,  6,  7),  the  median  is  the  (6  +  l)/2  =3. 5th 
observation  in  the  array.  To  avoid  ambiguity,  it  is  agreed  that  the 
median  will  be  halfway  between  the  third  and  fourth  observations  in 
the  array.  Thus  Af  =  5. 

When  data  are  grouped  in  class  intervals  as  in  Table  4.4,  the  median 
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cannot  be  located  exactly.  However,  if  we  assume  that  the  observations 
in  each  class  interval  are  uniformly  distributed  over  the  interval,  a  close 
approximation  to  the  median  may  be  obtained.  The  first  step  is  to  lo 
cate  the  class  in  which  the  median  belongs :  This  is  done  by  adding  up 
the  class  frequencies  until  we  find  the  class  which  contains  the 
[(w-f-l)/2]th  observation.  Of  course,  if  a  cunrulative  frequency  distribu 
tion  has  been,  formed  as  in  Table  4.5,  the  median  class  is  easily  located. 
Then,  the  sample  median  may  be  approximated  using  the  equation 


M  & 


»+    1 

c 

2 


(4-5) 


where 

LM  =  lower  limit  of  the  median  class, 
n  =  number  of  observations  in  the  sample, 

,JJ=ssum  of  the  frequencies  in  all  classes  preceding  the  median  class, 
fM  ==  frequency  in  the  median  class,  and 
t#  =  width  of  the  median  class. 

Example  4,8 

Considering  the  data  of  Table  4.4,  we  sec  that 

7QV 

64.01  milliseconds. 


/ini  — .  7Q\ 

M  &  64  +  (  — — —  )  (1) 
\        34        / 


It  is  possible  to  approximate  the  median  graphically  from  a  cumula 
tive  frequency  (ogive)  curve  using  the  relative  curmiiative  frequency 
(r.c.f.)  scale.  This  will  be  illustrated  in  Section  4.9. 

The  median,  a  measure  of  position,  is  affected  by  the  number  of  items 
bxit  not  by  the  magnitxide  of  extreme  values.  Two  characteristics  of  the 
median  which  are  of  interest  are:  (1)  the  sum  of  the  absolute  values  of 
the  deviations  from  the  median  is  less  than  the  sum  of  the  absolute 
values  of  the  deviations  from  any  other  point  of  reference,  and  (2)  theo 
retically  the  probability  IB  §  that  an  observation  selected  at  random 
from  a  set  of  data  will  be  IOHH  than  (greater  than)  the  median. 

Some  advantages  and  disadvantages  of  the  median  with  which  one 
should  be  familiar  if  he  wants  to  make  proper  une  of  this  statistic  will 
now  be  mentioned.  The  advantages  are:  (1)  it  in  easy  to  calculate,  and 
(2)  it  is  often  more  typical  of  all  the  observations  than  is  the  arithmetic 
moan  sinco  it  is  not  affected  by  extreme  values.  The  disadvantages*  are: 

(1)  the  items  mtint  be  arrayed  before  the  median  can  be  obtained  and 

(2)  it  does  not  lend  itself  to  algebraic  manipulation* 

4.9      PERCENTILE,  DECILE,  AND  QUARTILE  LIMITS 

In  this  section  we  shall  consider  locating  various  values  which  divide 
i  he  population  or  sample  into  groupn  according  to  the  magnitude  of  the 
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observations.  The  median  (see  Section  4.8)  was  obviously  one  such 
value  since  it  divided  the  array  into  two  groups,  each  containing  50  per 
cent  of  the  observations.  We  now  wish  to  determine  other  such  values. 
Let  us  consider  the  most  general  case  first.  If  we  want  to  locate  a 
value,  say  P#,  such  that  p  per  cent  (0  <p  <  100)  of  the  observations  are 
less  than  Pp  and  100  —  p  per  cent  of  the  observations  are  greater  than  Pp, 
we  call  Pp  the  upper  limit  of  the  pth  percentile,  and  approximate  P&  by 
the  [p(n+ l)/100]th  observation  in  the  sample  array  if  we  start  counting 
from  the  smallest  value.  For  example,  P$7  is  the  upper  limit  of  the  67th 
percentile  and  is  approximated  by  the  [67(n+l)/100]th  observation  in 
the  sample  array.  Similarly,  P6e  is  the  upper  limit  of  the  66th  percentile 
and  is  approximated  by  the  [66(n+l)/100]th  observation  in  the  sample 
array.  If  we  refer  to  the  67th  percentile,  we  mean  the  interval  from  P6e 
to  P67 — in  general,  the  pth  percentile  is  the  interval  from  Pp~i  to  Pp. 
(NOTE:  Percentile  limits  are  special  cases  of  the  fractiles  introduced 
in  Definition  3.32.) 

Example  4.9 

If  we  consider  the  data  of  Table  4.3,  what  is  the  upper  limit  of  the  80th 
percentile?  P8o  is  approximately  the  80(201  +  1)/100  =  161.6th  observa 
tion  in  the  array  which  is  67  milliseconds. 

Example  4.10 

What  is  the  upper  limit  of  the  35th  percentile  in  the  sample  given  in 
Example  4.7?  P3S  is  approximated  by  the  35(6  +  l)/100  =  2.45th  obser 
vation  in  the  array.  To  avoid  ambiguity,  we  agree  to  set  P$$  forty-five 
one  hundredths  of  the  way  from  the  second  to  the  third  observation 
in  the  array  when  we  count  from  the  smallest  value.  Thus  P3s  is  0.45  of 
the  way  between  3  and  4,  that  is,  jP35  =  3.45. 

The  reason  for  the  word  percentile  should  now  be  clear :  if  we  locate 
Pi>  Pz,  -  •  -  ,  PQQ,  we  have  (theoretically)  split  our  array  into  100  parts 
(percentiles) ,  each  containing  1  per  cent  of  the  observations. 

The  meaning  of  such  terms  as  decile  limits  and  quartile  limits  (see 
the  heading  of  this  section)  is  now  almost  obvious.  The  decile  limits 
DI,  D2;  -  •  -  ;  Z>9  theoretically  split  our  array  into  ten  parts  (deciles), 
each  containing  10  per  cent  of  the  observations.  The  quartile  limits 
Qij  Qa,  and  <2a  theoretically  divide  our  array  into  four  parts  (quartiies), 
each  containing  25  per  cent  of  the  observations.  No  particular  methods 
of  calculation  will  be  presented  for  decile  or  quartile  limits  since  they 
are  only  special  cases  of  percentile  limits.  This  is  clear  once  we  observe 
that 

Pio  *  Di  P4o  =  Dt  P75  =  Q3 

P2o  =   Dz  Pfeo  ==   £>6  =  Qa  =  M  Pso  =*   £>s 

P2s  =  Qi  Peo  =  £>e  Poo  =   Dg 

P30    =     £>3  ^70    —     #7 
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In  Section  4.8  we  mentioned  the  possibility  of  estimating  the  median 
from,  the  graph  of  the  relative  cumulative  frequency  distribution.  To 
illustrate  this  technique  we  shall  undertake  the  location  of  percentile 
limits  in  general.  Consider  the  cumulative  frequency  curve  of  Figure 
4,2  which  we  reproduce  here  as  Figure  4.3.  The  procedure  is  as  follows: 
Pp  being  the  upper  limit  of  the  pth  percentile  which  says  that  p  per  cent 
of  the  observations  are  less  than  or  eqxial  to  Pp,  all  we  have  to  do  is 
locate  p/100  on  the  relative  cumulative  frequency  scale,  draw  a  hori 
zontal  luie  from  this  point  to  the  ogive  curve,  and  from  here  drop  a 


vertical  lino  down  to  the  horizontal  axis, 
illustrated  in  Figure  4.3  for  P$i  and  P&$. 


200 


thus  locating  Pp.  This  is 


UJ 

9 

^  12O 

u. 

UJ 

>    80 


4O 


/v-bc 


QC 


6O  62  64  66  68  7O 

TIME    (IN   MILLISECONDS) 

FIG-  4*3— Cumulative  frequency  (ogive)  "curve*'   plotted  from  Table  4.5, 

4.1O     THE   MODE 

Another  valxio  of  aid  in  describing  a  sample  is  the  mode*  The  mode,  is 
defined  as  the  value  which  occurs  most  frequently  in  the  sample-  The 
mode  of  the  sample  will  he  denoted  by  MO,  It  should  ho  obvious  that, 
the  mode  will  not  always  be  a  central  value;  in  fact,  it  may  often  he  un 
extreme  value.  Then  too,  a  sample  may  have  more  than  one  mode. 
We  should,  at  this  time,  distinguish  i>et\veen  an  ahtwlute  made  und  a 
relative  mode.  An  absolute  mode  in  what  wo  defined  above  (there  may,  of 
coxmto,  be  more  than  one  absolute  mode) ;  a  relative  mode  is  a  value 
which  occurs  more  frequently  than  neighboring  values  even  if  it  is  not 
an  absolute  mode. 


Example  4.11 

<Hvon  a  sample  eotmiHting  of  the  values  ft,  7, 
nay  thore  IK  no  nuKlc  or  tluvrc  an*  fiw  modern 
only  once, 


1,  4,  and  *i»  we  may 
e  otich  vahu*  oecurn 
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Example  4.12 

Considering  the  data  of  Table  4.3,  we  see  there  are  two  absolute 
modes,  64.5  and  65.25  milliseconds,  since  each  of  these  values  occurs  14 
times  and  no  other  value  occurs  that  frequently. 

Example  4.13 

Given  a  frequency  histogram  like  that  shown  in  Figure  4.4,  we  would 
say  there  are  two  relative  modes:  one  in  Class  A  and  one  in  Class  J5. 
However,  the  mode  in  Class  A  is  the  only  absolute  mode. 


A  B 

FIG,  4.4— Example  of  a  bimodal  frequency  histogram. 

If  our  data  are  grouped  in  class  intervals,  it  will  be  impossible  to 
locate  the  mode  exactly.  Under  such  circumstances,  the  best  we  can  do 
is  to  approximate  the  value  of  the  mode.  As  was  the  case  when  approxi 
mating  the  median,  the  first  step  is  to  locate  the  modal  class.  This  is 
accomplished  quickly  by  picking  out  the  class  interval  which  shows  the 
highest  frequency.  The  sample  mode  is  then  approximated  by 


MO  ^  ZMO  + 


(4-6) 


where 


&MO  =  lower  limit  of  the  modal  class, 

di  =  the  difference  (sign  neglected)  between  the  frequency  of  the 

modal  class  and  the  frequency  of  the  preceding  class, 
^2  =  the  difference  (sign  neglected)  between  the  frequency  of  the 

modal  class  and  the  frequency  of  the  following  class,  and 
w=*  width  of  the  modal  class. 

Example  4.14 

Consider  the  sample  given  in  Table  4.4.  The  modal  class  is  from  65  to 
66  milliseconds  with  a  frequency  of  37.  Therefore, 


MO  =  65  +  C ")  (1)  =  65.23  milliseconds. 

\o  "~r~  It) / 
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Since  the  mode  is,  by  definition,  the  most  typical  value,  it  is  often 
considered  the  most  descriptive  of  the  representative  values  discussed 
so  far.  However,  its  importance  diminishes  as  the  number  of  observa 
tions  becomes  limited. 

4-11      THE    RANGE 

All  the  representative  values  discussed  in  the  preceding  sections  have 
been  some  sort  of  average  or  measure  of  position.  It  must  be  clear, 
though,  that  they  are  not  sufficient  by  themselves  to  describe  most 
populations  or  samples  adequately.  This  statement  may  be  verified 
easily  if  we  consider  two  sets  of  data  which  have  the  same  mean,  the 
same  median,  and  the  same  mode  but  which  differ  greatly  in  the  amount 
of  variation  present  in  each  set  of  data.  It  would  seem  then  that  some 
measure  of  the  variation,  or  dispersion,  among  the  individual  values  is 
also  needed.  Several  such  measures  have  been  devised,  and  we  shall 
mention  four  of  these  in  this  and  succeeding  sections. 

A  measure  of  dispersion,  to  be  suitable,  should  be  large  when  the 
values  vary  over  a  wide  range  (and  there  are  quite  a  few  extreme 
values)  and  should  be  small  when  the  range  of  variation  is  not  too  great. 

The  simplest  measxire  of  variation  is  one  that  has  been  mentioned 
before,  that  is,  the  range.  If  we  denote  the  smallest  (minimum)  sample 
value  by  Xmin,  and  the  largest  (maximum)  sample  value  by  -Xn>ft*,  the 
sample  range  is  given  by 


The  sample  range,  though  easy  to  obtain,  is  often  termed  inefficient 
because  it  ignores  all  the  information  available  from  the  intermediate 
sample  values*  However,  for  small  samples  (n<  10),  the  efficiency  (rela 
tive  to  other  measures  of  variation  yet  to  be  defined)  is  quite  high.  For 
a  more  explicit  discussion  of  the  efficiency  of  the  range  relative  to  the 
standard  deviation  the  reader  is  referred  to  Section  4.12.  Thus  we  find 
the  sample  range  enjoying  a  favorable  reception  and  wide  use,  because 
of  ease  in  computation,  in  such  applications  as  statistical  quality  con 
trol  where  small  samples  arc  the  rule  rather  than  the  exception, 

Example  4.15 

For  the  sample  given  in  Example  4.7,  we  obtain  .R=*7-—  2  =  5. 

4-12     THE  STANDARD   DEVIATION   AND   VARIANCE 

Perhaps  the  best  known  and  most  widely  used  measure  of  variability 
is  the  standard  deviation.  Of  almost  equal  importance  is  the  square  of 
the  standard  deviation,  this  quantity  being  known  as  the  variance.  We 
shall  explain,  both  of  these  measures  by  defining  the  variance* 

The  sample  variance,  denoted  by  s2,  is  defined  by 
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-  Xy  +  •  •  -  +  (Xn  -  Xy}/(n  -  1)  (4  g) 

Sometimes  it  is  convenient  to  let  x^  =  Xi  —  X,  that  is,  it  is  simpler  to 
denote  deviations  about  the  mean  by  lower-case  letters.  Then, 


However,  the  best  form  for  machine  calculation  purposes  is 

n^      2  /    n         \  2 

L,  -Xt    —    C    2.4  x* 

-1  \    t—1  S 


n 


n  —  1 

The  sample  standard  deviation  is  then  defined  as  the  positive  square 
root  of  the  variance,  namely, 

s  =  v^5"".  (4.11) 

The  use  of  n  —  1  (instead  of  n)  when  defining  the  sample  variance 
may  seem  peculiar  to  the  reader,  since  we  implicitly  used  a  divisor  of  N 
when  defining  the  population  variance.  Our  reason  for  using  n  —  1  is 
this:  In  general,  one  prefers  unbiased  estimators2  to  biased  estimators, 
and  the  use  of  n—  1  gives  us  an  unbiased  estimator  of  <r2.  If  n  were  used, 
the  resulting  function  of  the  sample  observations  would  produce  biased 
estimates  of  the  unknown  population  variance  —  biased  because,  on  the 
average,  the  estimates  would  be  too  small.  Thus  the  student  of  statistics 
must  resign  himself  to  remembering  that,  while  the  population  variance 
is  defined  using  a  divisor  of  N,  the  sample  variance  requires  a  divisor 
of  n—  1.  Incidentally,  we  refer  to  n  —  1  as  the  degrees  of  freedom  associ 
ated  with  the  sample  variance  (and  standard  deviation)  , 

Example  4.16 

For  the  sample  (13,  5,  8,  5)  we  see  that 
$*  -  {(13  -  7.75)a  +  (5  -  7.75)2  +  (8  -  7.75)*  +  (5  -  7.75)*}  /3 
-  {(S.25)2  +  2(~  2,75)2  +  (0,25)2}/3  -  14.25     and 


thus  s  =  VI 4.25  =  3.775. 

If  we  had  used  the  formula  recommended  for  machine  calculation, 
the  same  value  of  s2  would  have  been  obtained: 

a  An  estimator  is  a  statistic,  that  is,  a  function  of  the  sample  values,  which  will 
provide  us  with  numerical  estimates  of  a  parameter. 
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s*  =*  {  132  +  52  +  32  +  52  _  (13  +  5  +  3  +  5)2/4}  /3 
^  283  -  (31)  8/4  ^  283  —  240.25  =  42.75  = 

If  the  sample  data  appear  in  a  frequency  distribution,  the  following 
forms  are  appropriate  for  calculation.  When  no  class  intervals  are  in 
volved  (as  in  Table  4.3), 


n  —  1 
or 


n  —  1 
where  -ST0  is  an  arbitrary  origin,  n—  y^/,  and  %***X-~XQ* 

Example  4.17 

ConBidor  "Pablo,  4.6.  Utfirig  Kqxiation  4.12,  we  obtain 
$*  =«  {12,700  -  (45C))2/1&}/17  -  1450/17  -  85.3. 
Utnng  Kqxiation  4.12a, 
3*  **  {1900  -  (-  90)«/18}/17  -  85,3. 

TABLE  4.6-tll  ust  ration  of  the  Use  of  Equations  4.12  and  4J2a 


,      . 

\  ^"  •  •"•  *"*  <&tj 


-Y 

/ 

.AY 

/-Y* 

Z 

& 

f& 

10 

3 

30 

300 

—  20 

—  60 

1  ,  200 

20 

5 

100 

2,000 

—  10 

—  50 

500 

30 

8 

240 

7,200 

0 

0 

0 

40 

2 

80 

3,200 

10 

20 

200 

Totals  I  H  450  1  2  ,  700  .  .  —  <>0  1  ,  <)()0 

When  claMH  intervals  are  involvc^l,  tho  appropriut<i  formuta^  arc: 

,.«?^-<2:/»v. 

n  —  1 
and 


i 

In  which  w  ivS  tho  width  of  a  claws  interval  and  £«*  (£  —  f0)/t^  where  fn  in 
an  arbitrary  origin* 
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TABLE  4.7-Illustration  of  the  Use  of  Equations  4.13  and  4.13a 


Totals 


18 


450        12,700 


—  9 


63 


Class 

Interval 

/ 

* 

/* 

f? 

i 

fi 

/* 

5 

<  X  < 

:  15 

3 

10 

30 

300 

—  2 

—  6 

12 

15 

<  X  < 

:  25  

5 

20 

100 

?, 

,000 

—  1 

—  5 

5 

?.S 

<  X  < 

:  35  

8 

30 

240 

7 

,200 

0 

0 

0 

3.S 

<  X  < 

:  45  .   .    . 

2 

40 

80 

^ 

,200 

1 

2 

2 

19 


Example  4.18 

Consider  Table  4.7.  Setting  £0 
be  verified  by  evaluating 

s*  =  {12,700  —  (450)2/18}/17 


,  it  is  seen  that  $2  =  85.3.  This  may 


or 
sz  = 


[{19-  (- 


It  was  mentioned  in  Section  4.11  that,  for  small  n,  the  range  is 
reasonably  efficient  relative  to  the  standard  deviation.  By  this  state 
ment  was  meant  that  if  one  wishes  to  estimate  <r,  it  can  be  done  using 
either  R  or  s.  When  sampling  from  a  normal  population,  the  efficiency 
of  the  sample  range  relative  to  the  sample  standard  deviation  as  an 
estimator  for  the  population  standard  deviation  is  given  in  Table  4.8. 
As  an  example  of  the  use  of  this  table,  if  a  person  desires  to  use  R 

TABLE  4.8-Efficiency  of  Range  (R)  Relative  to  Standard  Deviation  O)  as 
an  Estimator  of  cr  for  a  Normal  Population 


Sample  Size  (n) 

Relative     Efficiency 

<r/JS(#) 

2  

1.000 

0.886 

3  

0.992 

0.591 

4  

0.975 

0.486 

5  

0.955 

0.430 

6    

0.933 

0.395 

7  

0.912 

0.370 

8  

0.890 

0.351 

9  

0.869 

0.337 

10.           

0.850 

0.325 

12  

0.815 

0.307 

14  

0.783 

0.294 

16           

0.753 

0.283 

18  

0.726 

0.275 

20  

0.700 

0.268 

30  

0.604 

0.245 

40             

0.536 

0.231 

50  

0.490 

0.222 
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rather  than  s,  he  would  estimate  a-  by  calculating 

#  =  (R)  { Value  of  a/E(K)  for  given  n] . 


(4.14) 


4.13      THE  COEFFICIENT  OF  VARIATION 

The  coefficient  of  variation  has  been  explained  by  statisticians  in 
different  ways.  However,  attention  usually  is  called  to  the  rather  obvi 
ous  fact  that  things  with  large  values  tend  to  vary  widely,  while  things 
with  small  values  exhibit  small — numerically  small,  that  is— variation. 
Thus,  to  afford  a  valid  comparison  of  the  variation  among  large  values 
and  t'he  variation  among  small  values,  such  as  the  variation  among 
salaries  of  industrial  executives  and  the  variation  among  the  wages  of 
day  laborers,  the  variation  is  expressed  as  a  fraction  of  the  mean,  and 
frequently  as  a  percentage.  This  measure  of  relative  variation  is  called 
the  coefficient  of  variation  and  is  defined  as 


CV  -  s/  T  . 
In  percentage  form,  this  becomes 

100  CV  «  100(V  X)  per  cent. 


(4.15) 


(4.16) 


TABLE  4.9-Special    Form  for  Calculating  and    Presenting   Sample    Statistics 


n 

£* 

x 

£*» 

—  ' 

(T.xy/n 

-  —  • 

,^          ,u^     „ 

.... 

I>9 

*2 

$ 

—  - 

•A  max 

•Am  In 

-  —  „ 

—  - 

R 

SPECIAL  NOTES 

FORMULAE 

s*  -  2>V(»  - 

R  BK  A"™**  — 


largest  observation—  smallest  observation. 
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The  coefficient  of  variation  is,  of  course,  an  ideal  device  for  comparing 
the  variation  in  two  series  of  data  which  are  measured  in  two  different 
units;  e.g.,  a  comparison  of  variation  in  height  with  variation  in  weight. 

Example  4.19 

For  the  sample  given  in  Example  4.16  we  see  that  CV  =  3. 775/7. 775 
=  0.4871,  and  in  percentage  form  100  CV  =  48.71  per  cent. 

4.14      SUMMARY 

The  greater  part  of  this  chapter  has  been  devoted  to  outlining  meth 
ods  of  calculation  for  various  statistics;  i.e.,  functions  of  sample  values, 
which  are  useful  in  statistical  inference.  Not  all  of  the  statistics  dis 
cussed  are  used  in  everyday  applications.  However,  a  select  few  are  used 
so  often  that  it  is  convenient  to  have  a  standard  form  for  calculation 
and  presentation  of  results.  One  such  form  is  presented  in  Table  4.9. 
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Problems 

4.1        Plot  a  frequency  histogram  and  polygon  for  the  following  data.  Make 
approximate  eye-estimates  of  the  arithmetic  mean,  median,  and  mode. 

WEEKLY  WAGES  OF  188  FEMALE  EMPLOYEES  OF 
A  SHOE  MANUFACTURING  COMPANY 


$20  .  15 

$25.00 

$40.39 

$25.49 

$25  .  70 

24.15 

22.54 

23.80 

29.60 

18.74 

25.62 

23.89 

28.37 

26.00 

16.70 

26.00 

27.82 

24.80 

26.52 

28.09 

27.84 

25.80 

25.88 

25.04 

24.98 

22.97 

23.20 

23.24 

29.00 

24.55 

25.48 

20.88 

21.70 

25.76 

26.20 

28.00 

28.92 

27.92 

25.80 

22.45 

28.24 

25.70 

22.75 

21.40 

27.10 

31.37 

26.77 

26.00 

18.64 

27.39 

24.53 

24.25 

28.28 

30.32 

23.00 

28.13 

26.23 

21.55 

28.04 

25  .  58 

22.78 

26.88 

26.64 

22.83 

23.45 

25  .  20 

29  .  29 

25.62 

23.40 

26.12 

27.08 

24.40 

25.49 

30.48 

27.03 

26.11 

21.80 

20.85 

26.79 

26.25 

22.04 

22.54 

21.85 

25.65 

27.50 

29.48 

25.20 

26.00 

22.69 

25  .  78 

21.77 

24.32 

26.00 

22.52 

17.50 

26.52 

20.48 

22.92 

23,96 

26.00 

22.00 

22.44 

26.00 

26.35 

25.64 

22.48 

27.25 

24.19 

23.75 

28.94 

21.85 

22.99 

22.33 

24.18 

25.65 

23.12 

22.71 

26.48 
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4.2        Plot  a  frequency  histogram  and  polygon  for  the  data  given  below. 
PER  CENT  SILICON  IN  236  SUCCESSIVE  CASTS  OF  PIG  IRON 
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4.3        A  random  sample  of  201   women  students   was  obtained  and  their 
heights  and  weights  were  recorded  as  follows: 

HEIGHTS  AND  WEIGHTS  or  201  WOMEN  STUDENTS  AGED  18 
UNIVERSITY  OF  BRITISH  COLUMBIA,  1944-45* 
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Source:  U.B.C,  Students*  Health  Service, 

Plot  a  frequency  histogram  and  polygon  for  (a)  the  heights  and 
the  weights. 
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4.4  Plot   a   cumulative   frequency   curve   for   the   data  of   Problem   4.1. 
Estimate  the  median  from  this  curve. 

4.5  Plot  a  cumulative  frequency  curve  for  the  data  of  Problem  4.2.  Esti 
mate  the  median  from  this  curve. 

4.6  Plot  a  cumulative  frequency  curve  for  (a)  the  heights  and  (b)  the 
weights  given  in  Problem  4.3.  From  these  curves,  estimate  the  median 
height  and  median  weight. 

4.7  Given  the  samples  listed  below,  calculate  for  each  the  mean,  median, 
mode,  midrange,  range,  variance,  standard  deviation,  and  coefficient 
of  variation: 

(a)  5,  19,  -3,  7,  1,  1 
(i)  5,  -3,  2,  0,  8,  6 
(c)  6,  9,  5,  3,  6,  7 
(<Z)  1,3,2,  -1,5 
(«)  10,  15,  14,  15,  16 
(/)  0,  5,  10,  -3 
(£)  8,  7,  15,  -2,  0 

4.8  Suppose  that  F  =  100  and  s2  =  15.  What  would  the  values  of  7  and  s2 
become  if  each  original  observation  were  (a)  increased  by  10  units, 
(b)  multiplied  by  10  units? 

4.9  Given  the  observations: 


2 

10 

3 

10 

4 

20 

5 

50 

6 

30 

Calculate  the  same  statistics  as  asked  for  in  Problem  4.7. 

4.10  Calculate  the  same  statistics  as  asked  for  in  Problem  4.7  for  each 
of  the  following  sets  of  data: 

(a)  Problem  4.1 
(6)  Problem  4.2 
(c)  Problem  4.3. 

4.11  A  bag  of  potatoes  was  sampled  for  quality,  five  potatoes  being  selected 
at  random  from  the  bag.  Among  the  observations  recorded  were  the 
weights  of  the  potatoes:   17,   15,   10,   12,   and   11   ounces.   Calculate 
7,  s2,  s,  and  JR.  What  property  (using  the  word  rather  loosely)  is 
common  to  the  sample  range  and  the  sample  variance  (or  standard 
deviation)? 

4.12  Given  that  n~25,  222/2===600,  and  Y  =  204,   calculate  the  variance, 
standard  deviation,  and  coefficient  of  variation. 


C  H  APTE  R    5 

SAMPLING  DISTRIBUTIONS 

CERTAIN  SAMPLING  msTRtBUTioNS  pertinent  to  methods  to  be  pre 
sented  in  later  chapters  will  be  discussed  in  this  chapter.  The  law  of 
large  numbers,  TchebychefPs  inequality,  and  the  central  limit  theorem 
will  be  given.  Various  approximations  to  exact  sampling  distributions 
will  also  be  considered. 

5.1      SAMPLE   MOMENTS 

In  the  preceding  chapter,  the  calculation  of  several  different  sample 
statistics  was  outlined.  Of  particular  importance  are  those  statistics 
known  as  sample  moments.  They  are  defined  by 


(5.1) 
and 

n 

~  In  (5.2) 


where  7c  =  0,  1,  2,  •  •  -  .  In  particular,  it  is  noted  that,  wj  —  -ST.  The 
reader  will  see  the  similarity  of  the  above  definitions  to  the  definitions 
of  /4  <md  M*  given  in  Definitions  3,23  ami  3.25,  It  may  be  shown  that 
J£(m£)  «=  /4  for  all  k  but  K(mk)  docs  not  equal  pM  except  for  A*«=0,  1.  For 
this  reason,  mfk  is  known  as  an  unbiased  estimator  of  MiC^  —  O,  1,  •  -  •  ) 
while  m^  is  a  biased  estimator  of  /*A(fc=ss2,  3,  *  •  -  )•  In<ucleutally,  this 
last  remark  is  just  a  restate™ on t  of  the  reason  given  in  the  preceding 
chapter  for  using  $*  rather  than  w2  as  an  estimator  of  cr2. 


5.2     VARIANCE  OF  THE  SAMPLE  MEAN 

It  has  jxmt  boon  stated  that  K(^}—y..  That  in,  in  nampling  from  a 
specified  population,  the  expected  value  (or  average)  of  all  poasiblo 
Bample  meaiiB  is  the  popxtlation  mean.  However,  we  realise  that  it  ia 
equally  important  to  know  nomothing  about  the  variation  among  all 
possible  values  of  the  nample  mean.  To  investigate  thin  variation,  con 
sider  the  variance  of  the  sample  mean, 

n 

After  Borne  algebraic  manipulation,  it  in  wen  that 

2 
<rx  «   J 

1703 


£<*• -*>}>• 
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~  M)2  +  n(n  - 
{a-*2  +  (n  -  !)£[(*<  -  M)(*y  -  /*)]}/».  (5.4) 


Two  cases  must  now  be  distinguished:  (1)  random  sampling  and  (2) 
sampling  without  replacement  from  a  finite  population.  For  these  two 
cases,  we  obtain,  respectively, 

and 

^2     AT   *j 

(5.6) 


n    N  -  1 

A7"  being  the  size  of  the  population. 

The  preceding  result  is  very  important.  It  says  that,  no  matter  what 
the  population  (as  long  as  it  has  a  finite  variance),  the  distribution  of 
the  sample  mean  becomes  more  and  more  concentrated  in  the  neighbor 
hood  of  the  population  mean  as  the  sample  size  is  increased.  That  is, 
the  larger  the  sample  size,  the  more  certain  we  become  that  the  sample 
mean  will  be  close  to  the  (unknown)  population  mean.  This  result  will 
be  expressed  more  precisely  in  the  following  section. 

5.3     TCHEBYCHEFF'S    INEQUALITY 

A  useful  inequality  is  that  due  to  Tchebycheff,  namely 

P(  |   X  -  fJL  |    >   ka)    <    1/k*.  (5 .  7) 

This  inequality  is  often  expressed  in  the  following  alternative  form: 

P(\  X  —  »\   <  ka)  >  1  —  1/k2  (5 . 8) 

or 

P(fji  -  k<r  <  X  <  VL  +  k<r)   >   1   -   I/*2.  (5.9) 

TchebychefPs  inequality  shows  how  <r  may  be  used  as  a  measure  of 
variation.  It  can  be  applied  in  a  wide  variety  of  cases  for  it  assumes  only 
the  existence  of  M  and  <r2.  That  is,  no  assumption  is  made  concerning 
the  form  of  the  population  but  only  that  the  mean  and  variance  exist. 
If  we  restrict  our  attention  to  unimodal  distributions,  the  inequality 
may  be  sharpened.  Under  such  a  restriction,  we  obtain 

P(\  X  —  MO  |  >  kB)  <  4/9k*  (5.10) 

where  MO  is  the  mode  and  J52  =  cr2  +  (MO  —  ^)2.  An  alternative  form  is 
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where  A  =  (/*  —  MO)/cr.  It  should  be  noted  that  if  the  distribution  is 
not  only  unimodal  but  also  symmetric,  that  is,  M  —  MO,  then  Equations 
(5.10)  and  (5.11)  reduce  to 


P(  |  X  —  AC  |  >  *<r)  <  4/9k*.  (5.  12) 

5.4      LAW  OF    LARGE  NUMBERS 

It  is  now  possible  to  give  precise  formulation  to  the  law  of  large 
numbers.  Invoking  TchebychefFs  inequality  with  respect  to  the  sample 
mean,  we  have 


(5,13) 
or 

P(ju  -  £cr/V^"  <  X  <  fjL  +  £<r/vX>   >   1  —  l/£2-  (5.14) 

Setting  K~k<r/*\/n,  it  is  seen  that 

(5.  15) 


Thus,  when  sampling  from  any  population  with  a  finite  variance,  the 
sample  size  may  be  chosen  large  enough  to  make  it  almost  certain  that 
the  sample  mean  will  be  arbitrarily  close  to  the  population  moan.  This 
is  what  is  known  as  the  law  of  large  numbers. 

In  reliability  and  quality  control  work,  much  attention  is  given  to 
the  number  of  defective  items  in  a  sample  of  size  n.  Thus  it  will  be  of 
interest  to  see  how  the  law  of  large  numbers  provides  information  in 
such  a  case.  If  we  assume  random  sampling  from  a  binomial  population 
m  which  M~P  and  <r2~pCL  —  p),  then,  as  n  gets  large, 

^1  (5*16) 

where  x  is  the  number  of  defective  items  observed  in  a  sample  of  size  n 
and  e  is  an  arbitrarily  small  positive  quantity.  That  is,  as  n  increases, 
we  become  more  and  more  certain  that  the  observed  fraction  defective 
will  be  a  good  estimate  of  the  true  fraction  defective  in  the  population. 

5,5     CENTRAL   LIMIT  THEOREM 

Without  doubt,  the  most  important  theorem  in  statistics  is  the  can- 
tral  limit  theorem.  It  is  important  not  only  from  the  theoretical  point 
of  view  but  also  becaxise  of  its  impact  on  statistical  methods.  Since  a 
proof  of  this  theorem  Is  beyond  the  scope  of  this  text,  it  will  be  stated 
without  proof.  Hero,  then,  is  the  theorem  : 

//  a  population  has  a  finite  variance  of  <r%  and  mean  M?  then  the  dis 
tribution  of  the  sample  mean  approaches  the  normal  distribution  with 

the  variance  <r*/n  and  mean  jj,  as  the  sample  size  n  increases* 

t 
Note  that  nothing  Is  said  about  the  form  of  the  sampled  population. 
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That  is,  no  matter  what  the  form  of  the  sampled  population,  provided 
only  that  it  has  a  finite  variance,  the  sample  mean  will  be  approxi 
mately  normally  distributed.  This  is  indeed  a  remarkable  theorem. 

5.6      RANDOM  SAMPLINC  FROM  A  SPECIFIED 
POPULATION 

Suppose  a  random  sample  of  n  observations  is  obtained  from  a  given 
population.  The  joint  probability  density  function  for  (JSTi,  X2,  -  -  -  ,  Xn} 
will  be  represented  by  g(x^  •  •  •  ,  x«).  Now,  it  will  be  remembered  that 
a  random  sample  implies  statistically  independent  observations.  Fur 
ther,  for  statistically  independent  variables,  the  joint  probability 
density  function  may  be  expressed  as  the  product  of  the  marginal 
densities.  Thus, 


'     •    gnOn)  (5-17) 

and,  since  each  observation  came  from  the  same  population, 


where  f(x)  is  the  probability  density  function  describing  the  sampled 
population.  It  should  be  noted  that  Equation  (5.18)  gives  the  joint 
probability  density  function  of  the  sample  in  the  order  drawn.  Inci 
dentally,  the  function 

n  /(**•) 

i=i 
is  often  referred  to  as  the  likelihood  function. 

5.7     THE   HYPERGEOMETRIC   DISTRIBUTION 

In  many  instances,  the  type  of  sampling  performed  in  industrial  ap 
plications  is  the  selection  of  a  sample  of  n  items  out  of  a  lot  of  N  items. 
This  selection  is  usually  done  in  such  a  manner  that  the  sampling  is 
without  replacement.  Thus  we  have  a  "random"  sample  only  in  the 
specialized  sense  that  every  possible  group  of  n  items  in  the  lot  has  the 
same  chance  of  comprising  the  sample.  In  such  a  case,  if  x  represents 
the  number  of  defective  items  in  the  sample, 


/(*)   -  C(D7  x)-C(N  -  D,n-  x)/C(N,  n)  ; 
a-a,a+l,  ...,»-!,& 
a  =  max  (0,  n  —  N  +  2?) 
b  =  min  (D,  n) 

where  D  represents  the  number  of  defective  items  in  the  lot. 

The  distribution  specified  by  Equation  (5.19)  is  known  as  the  hyper- 
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geometric  distribution.  It  is  the  distribution  underlying  practically  all 
acceptance  sampling  by  attributes  where  an  item  of  product  is  classified 
as  either  defective  or  nondefective.  The  reader  should  become  very 
familiar  with  the  hypergeometric  distribution  and  be  competent  in 
evaluating  probabilities  associated  with  it. 

Using  the  theory  of  earlier  chapters,  it  is  seen  that 

M  =  E[X]  =  nD/N  (5.20) 

and 

<r*  =  E[(X  —  M)2]  =  nD(iV  —  D)(N  —  n)/N~(N  —   1).     (5.21) 


Thus,  as  expected,  the  average  number  of  defective  items  in  a  sample 
is  equal  to  the  size  of  the  sample  multiplied  by  the  fraction  of  defective 
items  in  the  lot. 

5,8     THE   BINOMIAL   DISTRIBUTION 

Suppose  that  a  random  sample  of  size  n  is  selected  from  an  infinite 
binomial  population  described  by 

*(y)  -  P"(i  -  p)1-*;    y  =  o,  i  (5.22) 

0  <  p  <  1 

or  that  a  random  sample  of  wize  n  is  selected  (using  sampling  with  re 
placement)  from  a  finite  population  of  N  items,  D  of  which  arc  defec 
tive.  In  the  latter  case  we  can,  therefore,  let  p~f)/N. 

Both  of  the  sampling  situations  described  above  lead  to  the  same 
sairxpling  distribution  of  j?  where  x  represents  the  number  of  defective 
items  in  a  sample  of  n  items.  This  distribution  is  described  by  the  p.f  - 

/(*)  «  C(»,  *)p*(l  -  p  "-*;     *  »  0,  1,  •  •  •  ,  n  (5.23) 

0  <  p  <  1. 

Using  theory  already  developed,  it  can  easily  be  shown  that 

M  ^  /<;[A*1  «  np  (5,24) 

and 

<r*  «  /4(-Y  -  M)^J  »  np(l  —  ^)  «  npg  (5,25) 


where  #«  1  ~-  p.  Those  results  will  prove  usefxil  in  later  work, 

Probabilities  associated  with  the  binomial  density  of  Equation  (5*23) 
or  with  its  cumulative*  form  have  boon  published  by  the  National 
Bureau  of  Standards  (4),  Robertson  (7),  and  Hornig  (8),  Reference  will 
be  made  to  nueh  tables  as  the  need  arises. 

5,9      BINOMIAL   APPROXIMATION    TO   THE 
HYPERGEOMETRIC 

Under  certain  conditions  it  in  permismhle  to  UHC  the  binomial  distri 
bution  aa  an  approximation  for  the  hypergeometrie  contribution.  This 
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approximation  is  usually  invoked  to  simplify  numerical  calculations. 
To  see  how  the  approximation  is  justified,  consider  the  hypergeometric 
distribution 

/(#)  =  C(D,  x)-C(N  —  D,n  —  x)/C(N,  n).  (5.26) 

Writing  this  out  in  detail,  we  obtain 


IN    N  —   1  N  —  (x  -  1)     N  —  x 

N  -  D  -  1  N  -  D  -  (n  -  x  -  I)' 

N  —  x  —  1  j\f  ~  (n  —  i) 


(5.27) 


Setting  D  =  pN  and  dividing  the  numerator  and  denominator  of  each 
factor  inside  the  braces  by  N,  it  is  seen  that 


1  -   l/N  1  —  O  -  1)/7V   1   —  x/N 

q  —  l/N  q  —  (n  ~  x  —  1)/N\ 


1  -  (n  - 
where  g=  1  —  p.  Letting  A7"  get  very  large,  it  is  clear  that 

/O)  —  >  C(n,  oc)p*q"-*.  (5  .  29) 

That  is  ,  if  JV  is  large,  the  hypergeometric  distribution  may  be  ap 
proximated  by  the  binomial  distribution.  The  question  of  how  large  N 
should  be  relative  to  n  before  using  the  binomial  approximation  is  one 
which  must  be  answered.  First,  since  tables  of  logarithms  of  factorials 
are  not  available  for  k  greater  than  2000,  calculation  of  the  hyper 
geometric  will  be  extremely  tedious  for  such  cases.  Second,  and  perhaps 
more  to  the  point,  Burr  (1)  has  said  that  if  the  lot  size  N  is  at  least 
eight  times  the  sample  size  n,  it  will  be  satisfactory  to  use  the  binomial 
as  an  approximation  to  the  hypergeometric.  However,  since  Burr's 
statement  is  only  a  general  comment  with  no  reference  to  the  magni 
tude  of  the  error  involved,  it  seems  only  fair  to  say  that  each  individual 
case  must  be  considered  on  its  own  merits. 

5.10   POISSON  APPROXIMATION  TO  THE  BINOMIAL 

In  instances  when  we  do  not  have  access  to  published  tables  of  the 
binomial  distribution,  it  becomes  necessary  to  find  some  way  of  ob 
taining  the  required  probabilities  without  excessive  calculation.  In 
such  cases,  we  usually  seek  some  form  of  approximation  to  the  binomial 
which  involves  less  computation  or  is  associated  with  tables  that  are 
more  readily  available.  Two  such  approximations  involve,  respectively, 
the  Poisson  and  normal  distributions.  The  first  of  these  will  be  dis 
cussed  in  the  present  section,  while  the  normal  approximation  will  be 
examined  in  Section  5.11. 
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If  p  is  very  small  (less  than  0.1)  and  n  is  quite  large  (greater  than  50), 
it  is  sometimes  convenient  to  approximate  the  binomial  p.f.  by  the 
Poisson  p.f.  in  which  jj.  —  np.  To  see  how  this  approximation  is  justi 
fied,  consider  the  following  argument.  In 


(5.30) 


f(x)  =  C(n, 

n(n  —!)-••(«  —  x  -f-  1) 


set  p  =  jji/n.  Then, 

n(n  -  1)   -  -  •  (n  -  x 


xl 


/  M  \y  /*  y 

f  _  J  f  !   _  _  1 

\  n  /   \  n  / 


/  w  \  /n  —  1\  /n  —  x+  1\    ^ 

""  w/  v  w  /  "  *  *  v^    «     /  ~^ 


* 


n 


If  we  let  n—  *>co  and  p—  ^0  such  that  np^p.  remains  constant, 

/(*)  -+  (1)(1)  -  -  (t)  -™-  e-*(O  -  ^  Y-  (5.32) 

»\  xl 

which  is  the  Poisson  probability  function. 

Therefore,  if  np  is  large  relative  to  p  and  n  i&  largo  relative  to  np?  the 
Poisson  may  bo  used  as  a  reasonable  approximation  to  the  binomial. 
All  that  is  necessary  is  to  net  p  in  the  Poisson  distribution  equal  to  np 
of  the  binomial  distribution  we  are  attempting  to  approximate.  In 
other  words,  the  means  of  the  two  distributions  have  been  equated. 

5.11      NORMAL  APPROXIMATION  TO  THE  BINOMIAL 

The  binomial  distribution  may  also  be  approximated  by  the  normal 
distribution.  As  in  the  preceding  section,  the  sample  »i2o  should  be 
reasonably  large  before  the  approximation  IB  employed, 

To  illustrate  the  nature  of  the  approximation,  ccmsidor  Figure  5.1. 
Hero,  the  binomial  distribution  for  n^  10  and  p^^  is  pictured  by  the 
ordinates  at  the  various  values  of  x.  If  rectangles  of  width  one  are 
erected  as  shown,  the  area  of  the  histogram  equala  1,  This  is  just  an 
alternative  way  of  expressing  the  fact  that  the  sum  of  the  ordinates 
equals  1.  Umng  areas  under  the  normal  curve,  probabilities  associated 
with  various  x  values  may  be  closely  approximated. 
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X=  NUMBER    OF   FAILURES   IN  SAMPLE  OF    SIZE    1O 

FIG.  5.1— Binomial  distribution  for  n  =   10  and  p  =    V2    (solid  line 
ordinates),    area    representation    (dotted    line    rectangles) 
and    the    normal    approximation. 

In  order  to  evaluate  probabilities  associated  with  a  normal  distribu 
tion,  the  mean  and  variance  must  be  known.  To  specify  the  mean  and 
variance  of  the  approximating  normal,  let  v  =  np  and  cr2  =  np(l  — p) 
=  npq  where  np  and  npq  are  the  mean  and  variance,  respectively,  of 
the  binomial  distribution  to  be  approximated.  Then,  for  any  integers  a 
and  b  (a<6)  in  the  closed  interval  (0,  n),  the  approximation  takes  the 
form : 


P{a  <:  X  <  b} 
P{a  <  X  <b] 
P{a  <  X  <  b} 


.,{ 


(a  —  £)  —  np 


P<- ^= —^Z 

-v/npg 


(a 


—  np 


•\/Vfcp<? 

O  —  £)   —  ^P 
-\/npq 


<  ^ 
^  Z 


\o  -f-  tJ   —  ^P 

(* 

^/npq 
+  i)  ~ 

np 

* 

-Vnpq 

np 

-^/npq 

or 


-  np 


-\/npq 


(5 . 33) 
(5.34) 
(5.35) 

(5 . 36) 


Other  illustrations  could  be  given,  but  the  foregoing,  together  with  the 
examples  which  follow,  should  be  sufficient.  The  important  thing  to 
note  is  that  ^  is  added  to  or  subtracted  from  the  limit  so  as  to  include 
or  exclude  a  or  6,  the  proper  choice  being  indicated  by  the  nature  of  the 
inequality.  This  adding  or  subtracting  of  £  is  often  referred  to  as  a 
"correction  for  continuity. " 

Example  5.1 

A  random  sample   of   100    observations   is   drawn   from   a   binomial 
population  in  which  p=0.2.  Evaluate  P  {10^X^25}.  We  say  that 


78  CHAPTER    5,    SAMPLING    DISTRIBUTIONS 


4  "~~  4 

-  P{-  2.62  <  Z  <  1.38} 

=  G(1.38)  —  G(  —  2.62)  -  0.91621  -  0.00440 
=  0,91181. 

Example  5.2 

Referring  to  Example  5.1,  evaluate  P  {  10  <  A"<25  }  .  We  have 


4         ~     4     > 
=  P{-  2.37  <  Z  <  1.38}  =  G(1.38)  -  G(-  2.37) 
=  0.91621  —  0.00889  =*  0.90732. 

Example  5.3 

Referring  to  Example  5,1,  evaluate  P  {X  >2G  }  .  Proceeding  as  before, 
P{X>26}   c* 


—  1  —  G(1.62)  *  1  —  0,94738  =  0.05262. 

It  is  reasonable  to  ask  what  error  is  involved  in  using  the  approxima 
tion  just  described.  Mood  (3)  has  said  that,  if  npq>25,  the  error  is  less 
than  Q.l5/^/npq.  However,  we  should  realize  that,  for  a  given  n,  the 
normal  curve  gives  a  better  approximation  when  p  is  close  to  §  than 
when  p  is  close  to  0  or  1.  On  the  other  hand,  if  n  is  large  enough  (say 
100  or  more),  the  approximation  will  be  satisfactory  for  most  values  of 
p.  If  p  is  very  close  to  0  or  1,  the  approximation  will  be  lews  reliable  in 
the  tails  than  near  the  center  of  the  distribution.  Thus,  in  reliability 
work,  where  very  small  values  of  p  are  frequently  encountered,  the 
normal  approximation  may  not  be  too  good  and  one  should  use  either 
the  Poisson  approximation  or  calculate  exact  probabilities. 

5.12     THE   MULTINOMIAL   DISTRIBUTION 

If  a  random  sample  of  size  n  is  taken  from  the  multinomial  popula 
tion  described  by 


0  <  pi  <  1  (5.37) 

0,  1 


t—  1  »—l 

a  multinomial  distribution  is  obtained.  This  distribution  is  defined  by 

==  Pi*»2>««  •  •  •  p*rs      0  <      i  <  I  5 


Pi  -  1 


i**l 
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Xi  =  0,  1,  -  -  -  ,  n 

k 


where  x+  is  the  number  of  items  occurring  in  the  class  associated  with  pt-. 
The  number  pt  is  the  probability  of  any  item  being  assigned  to  the  ith 
class  and  it  is,  of  course,  the  fraction  of  the  total  population  belonging 
to  the  fth  class.  For  example,  an  item  of  product  may  be  assigned  to 
one  of  four  classes:  good,  minor  defect,  major  defect,  or  critical  defect. 
Then,  the  n  sample  items  would  be  classified  into  the  four  groups  upon 
inspection.  The  number  falling  in  the  first  group  would  be  denoted  by 
o?i,  the  number  in  the  second  group  by  x2,  and  so  on. 

5.13     THE  NEGATIVE  BINOMIAL  DISTRIBUTION  AND 
THE  GEOMETRIC   DISTRIBUTION 

A  sampling  distribution  encountered  fairly  often  in  industrial  appli 
cations  is  that  known  as  the  negative  binomial  distribution.  Suppose  p  is 
the  probability  of  a  defective  item  and  g  =  1  —p  is  the  probability  of  a 
nondefective  item.  If  random  sampling  is  being  carried  out,  it  is  fre 
quently  of  importance  to  know  the  probability  that  the  rth  defective 
unit  will  occur  on  the  (x+r)th  unit  sampled. 

To  obtain  the  probability  just  described,  it  is  noted  that:  (1)  the  last 
unit  must  be  defective  and  (2)  in  the  preceding  x+r—  1  units  sampled 
there  must  be  exactly  r  —  I  defective  units.  Then, 


=   {C(x  +  r  -  l,r  -  l)p-HT}  -p 

*;      *  -  0,  1,  •  •  •  .         (5.39) 


Another  way  of  saying  this  is  that  the  probability  of  the  rth  defective 
unit  occurring  on  the  mth  unit  sampled  is 

s(m)  =  C(m  —  1,  r  —  l)pr^-r;      m  =  r,  r  +  1,  -  -   •  .       (5.40) 


It  is  sometimes  of  interest  to  know  the  probability  of  the  rth  defective 
unit  occurring  on  the  rth  or  (r  +  l)st  or  ...  or  nth  unit  sampled.  This  is 
given  by 

n  n 

X)  C(m  —  l,r  —  l}pTqm~r  =   ^  C(n,  w)pmgw-m  (5  .41) 

w«r  mT 

and  the  last  expression  may  be  found  by  consulting  tables  of  the  cumu 
lative  binomial  distribution. 

If  in  Equation  5.39  we  let  r=l,  the  negative  binomial  distribution 
simplifies  to  the  geometric  distribution. 
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5.14      DISTRIBUTION    OF    A    LINEAR    COMBINATION    OF 
NORMALLY   DISTRIBUTED  VARIABLES 

Suppose  we  consider 

U  =   i^aiXi  (5.42) 

i—1 

where,  for  the  moment,  all  that  is  known  is  that  the  a*  are  constants 
and  the  Xi  are  variables.  It  is  clear  that 


(5.43) 

<— 1 

and 


where  yu*  is  the  mean  of  X+,  of  is  the  variance  of  -X",-,  and  o\-/  is  the  co- 
variance  of  Xi  and  X3.  If  all  the  -X\-are  mutually  (pairwise)  independent, 

4  -!>?*•*  (5-45) 

since  cr^  equals  0  if  X^  and  X$  are  statistically  independent. 

Consider  now  the  case  where  X*  is  a  random  sample  from  a  normal 
population  with  mean  n*  and  variance  erf  (i=  1,  -  -  -  ,  n).  In  this  case  it 
may  be  shown  that  U  is  also  normally  distributed.  If  each  X^  is 
randomly  selected  from  the  same  normal  population,  that  is,  from  a  popu 
lation1  A^(ju,  <r)>  then  U  is  normally  distributed  with  mean 


and  variance 

«r«  £  «'«. 

i—1 

5.15      DISTRIBUTION   OF  THE  SAMPLE   MEAN    FOR 
NORMAL   POPULATIONS 

In  Sections  5.1,  5.2,  and  5.5,  it  was  stated  that; 


(2)   <r|  «  <r^/n,  and 

1  Tho  notation  N(t*f  <r)  stands  for  "normally  distributed  with  mean 
standard   deviation  <r/' 
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(3)  regardless  of  the  form  of  the  sampled  population  (provided  it 
has  a  finite  variance),  the  distribution  of  the  sample  mean  is 
asymptotically  normal  with  mean  //  and  variance  a2/n. 

In  the  present  section  it  is  stated  (without  proof)  that  if  a  random 
sample  is  taken  from  a  population  N(p,,  <T),  then  the  sample  mean  will 
be  distributed  N(n,  o-f^/n}  for  all  values  of  n.  It  should  be  clear  that 
the  probability  density  function  for  ~X  is 

*\/  jyif 
~  -  e-n(x-M)2/2^  (5.46) 


and  that  Z=  Vn(X  —  jLt)/cr  is  N(Q,  1). 

5.16      DISTRIBUTION  OF  THE   DIFFERENCE  OF 
TWO  SAMPLE    MEANS 

If  a  random  sample  of  n^  observations  is  obtained  from  a  population 
with  mean  ^  and  variance  <r\  and  if  a  random  sample  of  n%  observa 
tions  is  obtained  from  a  population  with  mean  _M_2  and  variance  v\,  what 
can  be  said  about  the  distribution  of  U  =  Xi  —  X^  where  Xi  is  the  mean 
of  the  first  sample  and  -XT2  is  the  mean  of  the  second  sample? 

Regardless  of  the  form  of  the  populations  sampled,  it  is  true  that 

^V-*2  =  A**!  "~  Vx2  =  MI  —  M2  (5.47) 

and 

2  2 


=  <4  +  4  =  —  +  —  •  (5-48> 

xl  -T-    xt 


If,  howeyer,_the  sampled  populations  are  both  normal,  it  is  also  true 
that  U  =  Xi  —  X%  is  normally  distributed  with  mean  and  variance  given 
by  Equations  (5.47)  and  (5.48).  In  this  situation, 

(5.49) 


/    o~i 

V  ~^ 


is  normally  distributed  with  mean  0  and  variance  1. 

If  the  populations  are  not  normal  but  both  sample  sizes  are  suffi 
ciently  large,  the  central  limit  theorem  may  be  invoked  to  achieve  an 
approximate  normal  distribution  for  the  difference  of  two  sample  means. 

5.17      CHI-SQUARE   DISTRIBUTION 

One  particular  distribution  arises  quite  frequently  in  applied  work 
and  is  known  as  the  chi-square  distribution.  When  referring  to  the  chi- 
square  distribution,  the  parameter  v  is  called  the  degrees  of  freedom. 
The  probability  density  function  for  chi-square  with  v  degrees  of  free- 
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dom  is  given  by 

ffu\  =  !LLl f_L_  ;     u  >  0  (5  .  50) 

-^   J  2*/*rO/2) 

where  w  is  used  rather  than  x2  (chi-square)  for  ease  in  writing.  The 
cumulative  chi-square  distribution  is  tabled  in  Appendix  4  for  all  in 
tegral  values  of  v  from  1  through  100. 

5.18      DISTRIBUTION     OF     THE     SUM     OF     SQUARES     OF 
INDEPENDENT   STANDARD    NORMAL   VARIATES 

If  a  random  observation  is  obtained  from  a  normal  population  with 
mean  M  and  variance  <r2,  then  the  variable 

Z2  =  (X  —  /-OVV2  (5.51) 

is  distributed  as  chi-square  with  1  degree  of  freedom.  Now,  consider 
the  variable 

u  =  :fc  csr<  -  M*)Vo-«  (5-52) 


where  the  J5T*  are  independently  and  normally  distributed  with  means  /x» 
and  variances  erf.  Then,  U  is  distributed  as  ehi-«quare  with  k  degrees 
of  freedom. 

It  is  clear  that,  if  a  random  sample  of  size  n  is  obtained  from  a  nor 
mal  popxilatkm  with  mean  M  and  variance  a-2, 


U  -          (X*  -  ju)2A*  (5.53) 

t^i 

is  distribtited  as  ohi-Hquaro  with  n  degrees  of  freedom. 

5.19      DISTRIBUTIONS  OF  THE  SAMPLE  VARIANCE 
AND   STANDARD   DEVIATION    FOR   NORMAL 
POPULATIONS 

It  can  be  proved  that  the  mean  and  variance,  3T  and  s2,  of  a  random 
sample  from  a  normal  population  are  statistically  independent*  Further, 
it  is  readily  shown  that  the  variable 


IT  -   (n  -  1W*  -  S  (-Y,  -  ??)*/<?*  (5-54) 

*'«»n 

is  <listribiited  ns  (^hi-scixiare  with  v^n~~*\  degrees  of  freedom. 

From  the  preceding  result,  the  distributions  of   m^  \/m^  «2,  and  s 
can  be  obtained.  These  are: 


1  /    w,   \  («•— i)/2 

(  )  „,,(«-»>/»«-»»•/«-'  (5.55) 
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1  /  n  V 

(wiz)   =  • — • — 

^      y  /VJ  1\ 

r(V) 

\  ,          ,, v  „  ^,      /o^2  /  r-       c*/C\ 

-1  (Vw2)n~2e~'im2/2<r  ,  (5.56) 

V 


n      <«— 


\2cr 


1  /^7    _    1  \  (n—  1)/2 

st)     =    _  -  -  (   -  -  -)  (52)(n-3)/V-<"-l>*2/2*2,  (5.57) 

}  n  -  1\      \    2o-2    / 


r 

and 


/^-    1\ 


_    1  \  Cn 

(5.58) 


5.20      DISTRIBUTION   OF   "STUDENT'S"    t 

Consider  two  independent  random  variables,  Z  and  U,  where  Z  fol 
lows  a  standard  normal  distribution  and  U  follows  a  chi-square  dis 
tribution  with  v  degrees  of  freedom.  Form  the  ratio 

t  =  Z/VW^-  (5.59) 

Then,  the  probability  density  function  of  t  is 

_OT<,<0o       (5.60) 


and  it  is  referred  to  as  the  ^-distribution  with  v  degrees  of  freedom.  This 
distribution  is  extremely  useful  in  many  problems  of  statistical  in 
ference.  A  table  of  cumulative  percentage  points  of  t  is  given  in  Ap 
pendix  5. 

5.21      DISTRIBUTION   OF  F 

Given  two  independently  distributed  chi-square  variates,  U  with  vi 
degrees  of  freedom  and  V  with  ?,  degrees  of  freedom,  it  may  be  shown 
that 

(5.61) 


is  distributed  as  F  with  vi  and  v*  degrees  of  freedom.  The  probability 
density  function  is 
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+  ; 


/ 
\ 


f(F)  =  -  -  -  -  -  (  —  )       --  .     (5.62) 
J  ""2 


Of  particular  interest  in  applied  statistics  is  the  fact  that  when  two 
random,  samples  are  obtained,  one  from  each  of  two  normal  populations, 
the  ratio 


is  distributed  as  F  with  ^i  =  ni  —  1  and  v^  —  us  —  1  degrees  of  freedom. 
This  will  find  application  when  analyses  of  variance  are  discussed  later. 
Appendix  6  gives  certain  percentage  points  of  the  ^-distribution. 

5.22     ORDER  STATISTICS 

Observations  on  a  chance  variable  usually  occur  in  random  order. 
However,  in  certain  cases,  observations  ordered  according  to  magnitude 
are  encountered.  This  can  happen  in  two  ways:  (1)  the  observations 
were  obtained  in  random  order  but  were  subsequently  reordered  ac 
cording  to  magnitude,  and  (2)  the  observations  naturally  became 
available  in  order  of  magnitxide.  As  an  example  of  the  latter,  consider 
the  life  testing  of  a  group  of  vacuum  tubes.  The  first  observation  to 
arise  is  that  associated  witlx  the  weakest  tube  (i.e.,  the  txibe  with  the 
shortest  life),  the  second  observation  is  associated  with  the  next  weak 
est  tube,  and  so  on.  Since  such  data  occur  fairly  often  in  xudxistrial 
applications,  some  sampling  distributions  associated  with  order  sta 
tistics  will  now  be  discussed. 

Consider  a  population  specified  by  /(a;),  a<x  <  6.  Denote  the  smallest 
and  largest  values  in  a  random  sample  of  n  observations  from  this 
population  by  u  and  v,  respectively.  Then  it  may  be  shown  that 


v)  -  /?(*)>-*;     a^u^i^b.        (5  ,  63) 
The  marginal  p*d,f  /s  of  u  and  v  are 

gl(u)  «  n/O)[l  —  *X«0]n~l;     a  ^  u  £  b        '  (5.64) 

and 

]n~l;     a  ^  v  ^  b.  (5.65) 


These  distributions  are  very  useful  when  dealing  with  problems  involv 
ing  extreme  value®. 

Order  statistics  are  also  valuable  when  dealing  with  the  sample  range, 
ft  ea  D  —  u  =  -JTmax  —  -X°min-  If  Ht  Equation  (5*68)  we  let  v  ««  u+  R>  we  obtain 


,  K)  «  n(n  -  !)/(«)/(«  +  R)[F(u  +  -R)  -  F(u)}»^.      (5-66) 
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Then 

&— R 


=    f 

+J  n 


g(u,  K)du;     0  <  R  <£  b  —  a.  (5.67) 


It  should  be  noted  that  if,  instead  of  dealing  with  the  joint  distribution 
of  the  range  and  the  smallest  sample  value,  we  deal  with  the  joint  dis 
tribution  of  the  range  and  the  largest  sample  value,  namely, 


g(v,  R)  -  n(n  -  !)/(*  -  K)f(v)[P(i^  -  F(v  -  r}]—\       (5.68) 
then 

/* 
g(v,  R)dv;     0  <  R  <  b  -  a.  (5.69) 

u-f-72 

Equations  (5,67)  and  (5.69)  will,  naturally,  produce  the  same  result. 

Example  5.4 

If  /(re)  «=!,  0  <£  <1,  then  g(R)  =n(n  —  l)jRn~2(l  —  R)  where  0  <JB<1. 


Example  5.5 

There  is  no  simple  expression  for  the  distribution  of  the  range  when 
sampling  from  a  normal  population.  Pearson  (5)  gave  the  values  of  the 
mean  arid  standard  deviation  for  ranges  from  a  standardized  normal 
distribution.  Pearson  and  Hartley  (6)  evaluated  the  probability  integral 
of  the  range  for  sample  sizes  of  2  to  20.  Incidentally,  the  mean  and 
standard  deviation  of  the  range  when  a  standard  normal  population  has 
been  sampled  are  denoted  by  d%  and  ds,  respectively.  That  is,  when 
sampling  from  any  normal  population,  (JLR  =  d%crx  and  crR  =  d^orx.  Selected 
values  of  d*  and  da  are  given  in  Appendix  8. 

Problems 

5.1  How  large  a  sample  should  be  taken  if  we  want  to  be  95  per  cent  sure 
that  3T  will  not  fall  farther  than  cr/2  from  ju? 

5.2  A  book  of  400  pages  contains  400  misprints.  Estimate  the  probability 
that  a  page  contains  at  least  three  misprints. 

5.3  A  lot  contains  1400  items.  A  sample  of  400  items  is  selected.  If  no 
more  than  two  defective  items  appear  in  the  sample,  the  lot  will  be 
accepted.   Evaluate   the   probability  that  the  lot   will  be   accepted, 
assuming  that  the  lot  is  1  per  cent  defective. 

5.4  The  width  of  a  slot  on  a  forging  is  normally  distributed  with  mean 
0.900  inch  and  standard  deviation  0.003  inch.  The  specifications  are 
0,900  ±0.005  inch.  What  percentage  of  forgings  will  be  defective? 

5.5  Referring  to  Problem  5.4,  samples  of  size  5  are  obtained  daily  and 
their  means  computed.  What  percentage  of  these  sample  averages  will 
be  outside  specifications? 

5.6  The  diameters  of  some  shafts  and  some  bearings  are  each  normally 
distributed  with,  standard  deviation  equal  to  0.001  inch.  If  the  shaft 
has  a  mean  diameter  of  0.500  inch  and  the  bearing  has  a  mean  diame 
ter  of  0.503  inch,  what  is  the  probability  of  inter  ference? 
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5.7  Three  resistors  are  connected  in  series.  Their  nominal  ratings  are  10, 
15,  and  20  ohms,  respectively.  If  it  is  known  that  the  resistors  are 
normally  distributed  about  the  nominal  ratings,  each  having  a  stand 
ard  deviation  of  0.5  ohm,  what  is  the  probability  that  an  assembly 
will  have  a  resistance  in  excess  of  46.5  ohms? 

5.8  Rework  Problem  5.7  assuming  that  the  standard  deviation  is  5  per 
cent  of  nominal  in  each  case. 

5.9  A  "1-poimd"  box  of  candy  is  machine  packed  to  contain  32  pieces  of 
candy.  If  the  weights  of  the  pieces  of  candy  are  normally  distributed 
with  a  mean  of  0.5  ounce  and  a  standard  deviation  of  0.05  ounce,  what 
are  the  probabilities  that  a  customer  receives: 

(a)  loss  than  1  pound,  (b)  less  than  15  ounces,  (c)  more  than  1  pound, 
(d)  more  than  16.2  ounces,  (c)  exactly  1  pound? 

5.10  Referring  to   Problem  5.9  and  assuming  the  standard  deviation  re 
mains  unchanged,  how  should  you  change  the  mean  of  the  process 
so  that  only  1  customer  in  100  will  receive  lews  than  the  advertised 
weight? 

5 J  1  A  factory  assembles  stoves  at  the  rate  of  500  per  week.  On  the  average, 
5  per  cent  of  the  stoves  are  found  to  be  defective,  when  inspected 
following  final  assembly.  What  is  the  probability  that  next  week's 
production  will  contain  less  than  20  defective  wtovew? 

5.12  Review  all  parts  of  the  book  pertaining  to  the  Pomstm,  normal,  chi- 
square,  ty  and  F  distributions,  and  be  certain  that  you  know  how  to 
une  the  tables  in  Appendices  2  through  (K 
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C  H  APTE  R    6 

STATISTICAL  INFERENCE:  ESTIMATION 

IN  THIS  CHAPTER,  general  concepts  associated  with,  that  part  of  statisti 
cal  inference  referred  to  as  "estimation  and  prediction"  will  be  ex 
amined.  Examples  dealing  with  particular  populations  frequently 
encountered  in  applied  work  will  also  be  given. 

6.1      SOME   PRELIMINARY   IDEAS 

In  general,  we  do  not  know  the  values  of  the  parameters  of  the  dis 
tribution  function  or  the  values  of  the  population  mean  and  variance. 
In  practice  we  obtain  a  random  sample  from  the  specified  population, 
assuming  that  we  know  the  form  of  the  distribution  (normal,  binomial,' 
etc.).  From  the  sample  we  attempt  to  estimate  the  true  but  unknown 
values  of  the  population  parameters.  At  this  point  criteria  should  be 
stated  by  which  we  may  judge,  or  evaluate,  different  estimators  of  a 
parameter.  First,  let  us  define  an  estimator  as  some  function  of  the 
sample  values  which  will  provide  us  with  an  estimate  of  the  parameter 
in  question.  Now,  let  us  set  down  certain  desirable  properties  of  a  good 
estimator  which  may  be  used  as  criteria  to  distinguish  between  good 
and  bad  estimators.  Other  criteria  may  be  found  in  the  literature,  but 
the  three  given  here  are  perhaps  the  most  important  from  a  practical 
point  of  view. 

(1)  An  estimator  is  said  to  be  unbiased  if  the  expected  value  of  the 
estimator  is  equal  to  the  population  quantity  being  estimated. 
That  is,  if  §  is  an  estimator  of  0,  §  is  said  to  be  unbiased  if  the  aver 
age  of  all  possible  values  of  §  is  Q. 

(2)  Let  @  be  an  estimator  of  6  calculated  from  a  random  sample  of 
size  n.  If,  as  n  gets  very  large  (i.e.,  approaches  N  where  N  is  the 
number  of  items  in  the  population),  the  probability  that  §  will 
be  very  close  to  6  approaches  1,  or  certainty,  then  0  is  called  a 
consistent  estimator  of  6.  In  other  words,  if  we  take  a  larger  and 
larger  sample,  we  expect  to  get  an  estimate  which  is  very  close 
to  the  true  value,  and  the  probability  that  we  will  do  so  is  very 
great. 

(3)  If  0i  and  §2  are  two  different  (but  both  unbiased)  estimators  of  6 
with  variances  o^  and   <r$2,   respectively,  and  if   <r\<<r\,  then 
we  prefer  <?i  to  $2.  That  is,  in  general,  we  prefer  the  estimator 
(out  of  the  class  of  all  unbiased  estimators)  which  has  the  mini 
mum  variance. 

Estimates  of  the  type  discussed  above  are  of  a  special  kind  known  as 
point  estimates.  There  is  a  second  class  of  estimates,  however,  known  as 
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interval  estimates.  These  are  very  important  in  statistical  methodology 
and,  if  at  all  possible,  we  obtain  an  estimate  of  this  type.  Let  us  illus 
trate  the  difference  between  point  and  interval  estimates  by  a  short 
example. 

Example  6.1 

If  I  wish  to  estimate  the  average  weight  of  the  people  in  a  class 
room,  I  could  take  a  random  sample  of  five  people,  record  their  weights, 
and  average  them.  The  resulting  average  (suppose  it  turned  out  to  be 
160  pounds)  would  be  my  point  estimate.  However,  this  is  not  suffi 
cient  for  our  purposes.  If  *I  say  that  the  true  average  weight  of  all  the 
people  in  the  room  (they  are  my  population)  is  between  0  and^  300 
pounds,  I  am  very  confident  of  myself — in  fact,  I  am  almost  certain  of 
my  statement.  Rut  if  I  make  my  interval  much  smaller — and  in  prac 
tice  the  interval  should  be  as  small  as  possible— my  degree  of  confidence 
m  my  interval  estimate  will  become  less.  For  example,  if  I  say  1  think 
that  the  average  weight  of  all  people  in  the  room  is  between  158  and 
162  pounds,  my  degree  of  confidence  may  bo  quite  small, 

If  I  wish  to  be  ahlo  to  evaluate  my  degree  of  confidence  for  any 
interval  estimate,  it  is  customary  to  make  certain  assumptions  con 
cerning  the  distribution  of  the  observations  being  obtained.  Several 
examples  of  such  confidence  intervals  -will  be  studied  later  in  this 
chapter. 

6.2      METHODS  OF  OBTAINING  POINT  ESTIMATORS 

Several  principles  of  estimation,  leading  to  routine  mathematical 
procedures,  have  been  proposed  for  obtaining  "good'*  estimators. 
These  include: 

(1)  The  principle  of  momenta 

(2)  Minimum  chi-sqxiaro 

(3)  The  method  of  leant  squares 

(4)  The  principle  of  maxinuim  likelihood. 

The  application  of  these  principles  in  particular  canes  will  lead  to 
estimators  which  may  differ  and  hence  poHsesn  different  attributes  of 
"goodness."  A  principle  much  in  line,  yielding  estimators  with  many 
desirable  attributes  of  "goodness"  and  obtained  by  easily  applied 
routine  mathematical  procedures,  is  that  of  maximum  likelihood  de 
vised  by  H.  A,  Fisher  (7,  8}*  This  important  principle  of  estimation 
will  be  used  in  the  remainder  of  the  chapter. 

The  procedure  for  determining  the  maximum  likelihood  estimate  of 
a  population  parameter  6  is  as  follows; 

(1)   Determine  the  density  function  of  the  sample,  0(*Yt,  -X"*»  -  *  -  » 
Arn;  0).  Note  that  in  Section  5.0, 
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was  referred  to  as  the  likelihood  function* 
(2)    Determine 


=  log 


I—  1 


This  step  is  not  essential.  However,  since  likelihood  functions 
are  products,  and  since  sums  are  usually  more  convenient  to  deal 
with  than  products,  it  is  customary  to  maximize  the  logarithm 
of  the  likelihood  rather  than  the  likelihood  itself, 
(3)  Determine  the  value  of  6  which  will  maximize  L  by  solving  the 
equation 


6.3      MAXIMUM    LIKELIHOOD    ESTIMATORS 

Rather  than  burden  the  reader  with  the  details  of  obtaining  maxi 
mum  likelihood  estimators,  the  results  for  four  of  the  more  common 
distributions  are  presented  in  Table  6.1. 


TABLE  6.1-Maximum  Likelihood  Estimators  Associated  With  Certain 

Distributions 


Distribution 

Parameter 

Maximum  Likelihood  Estimator 

Binomial  

7-> 

p  =f/n  =  observed  relative  frequency 

Poisson  

X(=M) 

X=jLt=jr=  53  x/n 

Normal                 

a(=l*) 

a  =  £=X  =   21  X/n 

Exponential  

b\  =  a*) 
0(=/i) 

^=^^m^   ^Z(X~T^/n 
0=£=X=   ]£  X/n 

6.4      CONFIDENCE    INTERVALS:   GENERAL    DISCUSSION 

A  point  estimate  of  a  parameter  is  not  very  meaningful  without  some 
measure  of  the  possible  error  in  the  estimate.  An  estimate  §  of  a  param 
eter  6  should  be  accompanied  by  some  interval  about  §,  possibly  of  the 
form  6  —  d  to  §+d,  together  with  some  measure  of  assurance  that  the 
true  parameter  6  does  lie  within  the  interval.  Estimates  are  often  given 
in  such  form.  Thus,  the  activated  life  of  a  thermal  battery  may  be 
estimated  to  be  300  ±  20  seconds  with  the  idea  that  the  life  is  unlikely 
to  be  less  than  280  seconds  or  greater  than  320  seconds.  The  develop 
ment  engineer  engaged  in  research  on  capacitors  may  estimate  the 
mean  life  of  a  certain  type  of  capacitor  under  stated  conditions  to  be 
300  ±  50  hours  with  the  implication  that  the  correct  average  life  very 
probably  lies  between  250  and  350  hours.  The  failure  rate  for  a  specific 
component  might  be  estimated  as  being  less  than  0.02  with  the  feeling 


90  CHAPTER    6,    STATISTICAL   INFERENCE:    ESTIMATION 

that  the  true  failure  rate  is  most  likely  no  greater  than  the  stated  limit. 
In  this  last  case,  the  point  estimate  might  have  been  anywhere  between 
0  and  0,02. 

Confidence  intervals  enable  us  to  obtain  a  useful  type  of  information 
about  population  parameters  without  the  necessity  of  treating  such 
parameters  as  statistical  variables.  It  should  be  clearly  understood  that 
we  are  merely  betting  on  the  correctness  of  the  rule  of  procedure  when 
applying  confidence  interval  techniques  to  a  given  experiment.  It  will 
be  observed  in  the  following  sections  that  this  technique  may  be  ap 
plied  to  various  familiar  population  parameters  such  as  the  mean  and 
variance. 

An  examination  of  the  following  sections  will  reveal  that  the  method 
for  finding  confidence  intervals  consists  in  first  finding  a  random 
variable,  call  it  Z,  that  involves  the  desired  parameter  0  but  the  dis 
tribution  of  which  does  not  depend  upon  any  other  unknown  param 
eters.  Next,  two  numbers,  Z\  and  Z*9  are  chosen  such  that 

P{Z1  <  Z  <  Z*}   =  T  (6.1) 

where  y  *s  the  desired  confidence  coefficient,  such  as  0.95.  Then  the 
two  inequalities  are  manipulated  so  that  the  probability  statement 
assumes  the  form 

P{L  <  0  <  U]   -  y  (6.2) 

whore  L  and  U  are  random  variables  depending  on  Z  but  not  involving 
0.  Finally,  we  substitute  the  sample  values  in  fj  and  U  to  obtain  a 
numerical  interval  which  is  the  desired  confidence  interval,  It  is  clear 
that  any  number  of  confidence  intervals  can  be  constructed  for  a  pa 
rameter  by  choosing  %i  and  #2  differently  each  time  or  by  choosing  dif 
ferent  random  variables  of  the  Z  type, 

The  above  has  been  concerned  with  what  is  called  a  two-sided  con 
fidence  interval.  However,  we  sometimes  do  not  care  how  much  our 
estimate  may  err  in  one  direction  provided  that  it  is  not  too  far  off  in 
the  other.  For  example,  xve  may  he  estimating  a  standard  deviation 
which  we  hope  will  be  small.  We  would  be  concerned  only  about  an 
upper  limit  and  hence  would  want  an  interval  of  the  form 

p[e  <  U}   «  y.  (6.3) 

The  theory  of  one-sided  intervals  is  basically  the  same  as  for  two-sided 
intervals. 

6.5     CONFIDENCE  INTERVAL  FOR  THE  MEAN  OF  A  NOR- 
IVIAL   POPULATION 

All  the  necessary  statistics  are  now  available  to  make  possible  an 
excellent  scheme  for  estimating  the  mean  of  a  normal  population.  It  has 
already  been  Haul  that  3T  in  an  xmhianed  estimate  of  M-  However,  it  is 
possible  to  learn  a  little  more  about  the  estimate,  namely,  whether  3T 
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is  close  to  M  or  likely  to  be  far  removed  from  ju-  Making  use  of  the  f-dis- 
tribution,  the  following  statement  can  be  made:1 

P\X   —    £o.975(n-l)S^-    <    M    <    X   +    ^ 
=*     P<X   —    *0.975(n-l)-4:=    <   »    <X  +   /0.975Cn~l)    ~=          =    0.95  (6.4) 

v  -\/n 


where  io.arscn—  o  is  a  numerical  quantity  extracted  from  the  table  in  Ap 
pendix  5  under  the  column  labeled  0.975  and  for  n  —  1  degrees  of  free 
dom.  The  above  statement  (Equation  6.4)  is  read:  the  probability, 
before  the  sample  is  drawn,  that  the  random  interval 


will  cover,  or  include,  the  true  population  mean  /z,  is  equal  to  0.95. 
Thus,  if  a  random  sample  is  obtained  from  a  normal  population  with 
mean  M  and  variance  cr2,  and  the  two  quantities 

L    =    X    —    2o.975(«—  D-Sjg-  (6.5) 

and 

U    =    X   +    £o.975(n-l)Sjc  (6.6) 


are  computed,  it  can  be  said  that  one  is  95  per  cent  confident  that  the 
true  mean  ju  will  be  in  the  interval  (Z/,  C7).  One  does  not  say  that  the 
probability  is  0.95  that  M  lies  between  L  and  U  but  only  that  one  is 
95  per  cent  confident  that  AC  does  lie  between  L  and  U.  This  distinction 
is  made  because  M  either  does  or  does  not  fall  between  L  and  U;  the 
probability  is  either  0  or  1  for  /UL  is  a  constant  and  does  not  possess  a 
probability  distribution.  The  distinction  made  above  is  a  subtle  one 
and  the  concept  may  not  be  fully  appreciated  at  this  time.  However, 
it  is  a  distinction  that  must  be  made. 

Example  6.2 

Consider  the  estimation  of  the  mean  breaking  strength  of  some  par 
ticular  material.  We  take  at  random  a  number  of  samples,  for  this 
example,  six,  and  subject  them  to  test,  recording  the  pressure  at  which 
they  break.  These  values  might  be  as  follows: 

2206  Ibs.  2203  Ibs. 

2209  Ibs.  2206  Ibs. 

2205  Ibs.  2207  Ibs. 

Averaging  these  values,  we  obtain  a  point  estimate,  that  is,  one  value, 
of  2206  Ibs.  This  means  that,  from  our  sample,  a  reasonable  estimate  of 
the  true  (population)  average  breaking  strength  of  the  material  is 
2206  Ibs.  However,  we  do  not  have  any  measure  of  our  degree  of  con- 

1  The  symbol  sg  is  known  as  the  standard  error  of  the  mean,  and  it  is  clearly  an 
estimator  of  <rg  as  defined  by  Equation  (5.5). 
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fidence  in  this  estimate.  If  we  are  willing  to  assume  a  normal  distri 
bution,  we  can  find: 

L    =5    X   —    £o.975(«-l)<Sj£. 

«  2206  —  2.571(0.8165)  =  2203.9  Ibs. 
U  =  X  +  £0.975(71-1)^ 

=  2206  +  2.571(0.8165)  =  2208.1  Ibs., 

where  £0.975(5)  =2.571  was  obtained  from  the  table  in  Appendix  5  and  all 
other  values  were  calculated  from  our  sample.  We  can  now  say  that  we 
are  95  per  cent  confident  that  the  true  population  mean  breaking 
strength  lies  between  2203.0  Ibs,  and  2208.1  Ibs.  A  99  per  cent  confi 
dence  interval  can  be  found  in  a  similar  manner  using  £0.995(5)  =4,032. 
It  should  be  noted  that,  in  general,  a  WOy  per  cent  confidence  interval, 
0  <y  <1,  may  be  obtained  by  using  tfui-foo/ajc™—  D  *n  Equation  (6,4). 


Rather  than  proceed  as  in  Example  6.2,  we  might  have  wanted  only 
a  lower  confidence  limit.  That  is,  we  might  have  no  interest  in  an  upper 
limit  on  breaking  strength,  since  ordinarily  no  harm  can  result  from  the 
material  being  too  strong.  The  statement  needed  (assuming  a  0,95  con 
fidence  coefficient)  is  then 


<  M     =  0.95.  (6.7) 

Note  that  here  io.osu-i)  is  used  instead  of  £o.975(u~»i)  since  we  want  the 
entire  0.05  error  risk  to  he  on  one  side  of  the  limit  rather  than  to  be 
split  equally  beyond  two  limits.  Thus,  we  would  obtain 

L    —    X    —    2o.95<»—  l)$j£ 

«  2206  —  2.015(0.8165)  «=  2204.4,  (6.8) 

and  could  then  state  that  we  are  95  per  cent  confident  that  the  true 
population  mean  breaking  strength  is  above  2204.4  pounds*  In  general, 
the  lower  limit  woxild  be  //  —  3T  —  £7(n~i>S£. 

It  should  be  clear  that  if  only  a  lOOy  per  cent  upper  confidence  limit 
is  desired,  the  procedure  woxald  be  to  calculate 

U    «    J£  +   *-y  («-!)$£.  (6.9) 


6.6     CONFIDENCE  INTERVAL  FOR  THE  MEAN  OF  A  NON- 
NORMAL  POPULATION 

A  question  which  might  logically  arise  is  "What  can  we  do  if  we  want 
a  confidence  interval  estimate  of  the  mean  of  a  nonnormal  population?" 
The  central  limit  theorem  discussed  in  Chapter  5  provides  us  with  an 
answer  which  is  often  satisfactory.  That  is,  unless  the  distribution  is 
rmxch  different  from  normal  and  the  sample  si#e  in  extremely  small,  the 
distribxition  of  sample  means  will  be  nearly  normal  so  that  the  normal 
theory  may  be  applied  with  only  a  small  error, 

However,  if  the  error  introduced  by  the  approximate  procedure  sxig- 
gestod  in  the  preceding  paragraph  cannot  be  tolerated,  we  always  have 
recourse  to  exact  methods  associated  with  the  partictilar  population 
distribution  involved.  No  attempt  will  be  made  to  list  all  the  different 
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situations.  Rather,  we  shall  state  only  that  the  basic  approach  is  always 
the  same  as  outlined  in  Section  6.4.  If  the  need  arises  for  an  exact  answer 
for  a  nonnormal  distribution,  the  reader  is  referred  to  many  such  ex 
amples  in  the  literature.  If  the  particular  case  in  question  cannot  be 
located  in  this  manner,  a  mathematical  statistician  should  be  con 
sulted. 

6.7      CONFIDENCE   INTERVAL   FOR  THE  VARIANCE  OF  A 
NORMAL   POPULATION 

Using  a  technique  similar  to  that  outlined  in  Section  6.5,  a  confidence 
interval  for  estimating  the  variance  of  a  normal  population  can  be 
found.  This  time,  however,  the  chi-square  distribution  will  be  used  to 
obtain  the  confidence  interval  specified  by 


\ 

J 

' 


{ 


Y 

(n—  1)  •X[(l~T)/2]  (n—  1) 

=T  (6.10) 


(n—  1)  [U—  Y)/2]  (n—  1 

and  this  is  read:  the  probability,  before  the  sample  is  drawn,  that  the 
random  interval  (Z/3  Z7),  where 

__  (6_u) 


-1)  X[  (l4_y)  /2] 

and 


U-  -      .  ,  (6.12) 

*  [  Cl-Y)  /2]  (n-1)  *  C  (1—  Y)  /21  Cn—  1) 

will  include  the  true  population  variance  a-2  is  equal  to  y.  Or,  as  it  is 
more  often  phrased,  we  are  100-y  per  cent  confident  that  the  true  popu 
lation  variance  cr2  will  be  in  the  interval  (I/,  C7)  .  For  the  example  used 
in  Section  6.5,  we  find  the  90  per  cent  confidence  interval  for  a-2  to  be 
(1.8,  17.5). 

As  with  means,  we  can  determine  a  one-sided  confidence  interval. 
This  would  be  defined  by 


-\ 

L 

' 


T  (6.13) 

1—  T)  (n—  1)   ' 

if  an  upper  limit  is  desired.  Although  a  lower  limit  is  conceivable,  it 
would  seldom  be  of  interest. 

If  we  are  interested  in  a  confidencejuaterval  for  estimating  cr  rather 
than  cr2,  the  confidence  limits  Z/  =  VZ  and  Uf=\/U,  where  L,  and  U 
are  the  confidence  limits  for  cr2,  may  be  computed.  It  should  be  noted 
that  this  is  not  the  exact  solution.  However,  it  is  sufficiently  accurate 
for  most  purposes, 
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6.8     CONFIDENCE    INTERVAL    FOR  p,    THE    PARAMETER 
OF  A   BINOMIAL   POPULATION 

It  has  already  been  suggested  in  Section  6.3  that  the  best  point  esti 
mate  of  p  is 

p  =  f/n  =  observed  relative  frequency.  (6.14) 

If  a  two-sided,  lOOy  per  cent  confidence  interval  estimate  of  p  is  de 
sired,  the  following  two  equations  must  be  solved  for  p: 

CO,  x)p*(l  -  p)--*  -  (1  -  T)/2  (6.15) 


(6.16) 

The  solution  of  Equation  (6.15)  is  L,  while  the  solution  of  Equation 
(6.10)  is  ?/.  If  /==0,  L  is  taken  to  be  0;  if  /=n,  J7  is  taken  to  be  I.  You 
may  then  state  that  you  are  IQOy  per  cent  confident  that  the  true 
value  of  p  is  between  Ij  and  £7. 

Example  6.3 

Consider  an  industrial  process  producing  parts  which  arc  classified 
either  as  defective  or  mmdefeetive.  In  a  random  sample  of  200  items, 
6  arc  found  to  be  defective.  Thus  p  =0/200  —  0.03.  To  obtain  L  and  U 
such  that  we  would  have  95  per  cent  confidence  in  the  limits,  we  would 
substitute  200  for  n,  0  for  /,  and  0.95  for  y  in  liquations  (6.15)  and 
(6.16)  and  solve  lining  tables  of  the  binomial  distribution*  However, 
due  to  the  nature  of  the  tables,  only  approximate  answers  would  be 
possible.  That  IB,  the  tables  are  not  published  for  small  enough  incre 
ments  of  p  to  permit  an  exact  solution.  Interpolation  would  ho  necessary. 

Because  the  computation  involved  in  solving  Equations  (0.15)  and 
(0,10)  in  tedious,  several  attempts  have  boon  made  to  provide  conven 
ient  tablet*  uncl  graphs  for  the  research  worker  to  use.  For  example, 
Uald  (12)  has  published  comprehensive  tables  for  certain  sample  sixes. 
More  condensed  tables  are  given  in  Rnodooor  (20).  (Copper  atul  Pear 
son  (5)  published  charts  which  arc  very  helpfuL  (1alvert  (4)  gives 
charts  and  nomographs  from  which  one-sided  tipper  eonficlenee  limits 
can  be  read  with  reasonable  accuracy.  Mxieiich  (15)  has  constructed  a 
compact  and  easily  used  slide  rule  which  extends  the  charts  provided 
by  Oalvert. 

As  pointed  out  in  Sections  5.10  and  5.1  1,  it  is  often  possible  to  ap 
proximate  the  binomial  distribution  by  the  Poisson  or  normal  distri- 
btition.  These  approximations  can  sometimes  be  used  to  advantage  in 
establishing  confidence  intervals  for  binomial  probabilities*  However, 
the  details  will  not  be  discussed  here. 
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6.9      CONFIDENCE  INTERVAL  FOR  THE  DIFFERENCE  BE 
TWEEN  TH  E  M EANS  OF  TWO  NORMAL  POPU  LATIONS 

Many  practical  problems  in  statistics  involve  the  comparison  of  two 
sample  means.  When  the  two  random  samples  from  which  the  means 
are  computed  can  be  assumed  to  have  come  from  normal  populations, 
confidence  limits  for  the  true  difference,  that  is,  for  the  difference 
between  the  means  of  the  two  populations  may  be  computed. 

Case  I  :  erf  =  cri 

If  it  can  be  assumed  that  the  two  normal  populations  have  equal 
variances,  that  is,  if  we  can  assume  a  common  variance  a2,  then  the 
ratio 

~y        "V  "V"        rV:  /~^V"  V    \.          /  N.    1  /9 

-A.  1  -A-2  -"^  1    -^L  2  /  .A  l     -^-2\      /          W-l^Z-2         X4-'* 


+  nj  (6.17) 


is  distributed  as  Student's  t  with  ni+n2—  -2  degrees  of  freedom  if  s2  is 
calculated  by  means  of  the  formula 

(6.18) 

In  Equation  (6.18)  the  expressions  T^aff  and  2^x1  represent  the  sum 
of  the  squares  of  the  deviations  about  the  means  in  the  first  and  sec 
ond  samples,  respectively.  Also,  s2  is  often  referred  to  as  the  pooled 
estimate  of  variance. 

Under  the  assumptions  stated  above,  100^  per  cent  confidence  limits 
for  /xi  —  ^2  may  be  found  by  calculating 


•+•  n2  —  2  HI  +  n%  —  2 


*  \, *  +js  •         '   i,  \  A~r  f  / 1  **t    \'f  i~T~'"Z       **  /  ~  Ji,  i  —Ji.  2  *"  \         "  s 

Example  6.4 

There  are  two  methods  of  measuring  the  jnoisture  content  of  heat- 
processed  beef.  For  Method  1  we  obtain  ^"1  =  8^6,  s?  — 109.63,  and 
ni  — 41.  For  Method  2  the  comparable  results  are:  Jf2  =  85.1,  s!  =  65,99, 
and  ri2==31.  Thus, 

J2  =  (40(109.63)  +  30(65.99)  }/70  =  90.93, 
and  ^-xa  M  {(90.93/41)  +  (90.93/31) }  ^  =  2.27. 

Finally,   assuming  an  80  per   cent  confidence  interval  is  desired,   we 
obtain  L=*  3. 5-  (1.294)  (2. 27)  ^0.6  and  J7  =  3. 5  + (1.294)  (2. 27)  ^6.4. 


Case  II  : 

If  there  is  reason  to  believe  that  the  two  populations  have  different 
variances,  the  procedure  just  discussed  is  not  appropriate.  What,  then, 
can  be  done?  If  we  are  willing  to  assume  that  Si  —  crf  and  sl  =  crl,  an 
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approximate  100-y  per  cent  confidence  interval  may  be  found  by  calcu 
lating 

i/»i  +  Jl/»2)1/a  (6.20) 

where  3[(14_T)/;>]  is  the  100(1  +  ^)72  fractile  of  the  standard  normal 
distribution.  However,  because  of  the  doubtful  validity  of  the  assump 
tion  that  the  sample  variances  equal  the  population,  variances,  this 
procedure  provides  only  a  very  crude  estimate  of  the  true  mean  differ 
ence.  Consequently,  the  procedure  should  be  used  with  extreme  cau 
tion. 

Case  III:  Paired  Observations 

If  two  samples  of  equal  size  are  obtained  (that  is,  if  ?&i  =  n2==n)  and 
if  the  observations  in  one  sample  can  logically  be  paired  with  the  ob 
servations  in  the  other  sample,  a  modified  procedure  applies.  By  pair 
ing,  it  is  meant  that  the  observations  (Xi,  X%,  •  •  •  ,  Xn)  and  the 
observations  (Fi,  Y2,  -  -  •  ,  Fn)  are  associated  as  follows: 

Xi  is  related  to  Y\ 
Xi  is  related  to  F$ 


Xn  is  related  to  Yn* 

In  the  language  of  a  later  chapter,  the  variables  X  and  Y  are  said  to  be 
correlated.  When  such  a  correlation  in  assumed  to  exist,  an  appropriate 
procedure  is  to  calculate  the  differences,  /->  —  X  —  Y,  and  then  estimate 
/jiD^fj>x~~VY»  Confidence  limits  are  then  given  by 


where  4=*4>/n  and  $%**  {  22/>2-  (  J^D^/n}  /(n-l). 


Example  6.5 

It  IB  desired  to  compare  the  prices  of  Delicious  and  Melntoeh  apples. 
On  a  certain  day>  prices  (per  box)  were  obtained  from  a  random  selec 
tion  of  eleven  markets.  ABBUming  (I)  prices  to  be  normally  distributed 
and  (2)  the  price  of  one  variety  in  a  market  would  he  influenced  by  the 
price  of  the  other  variety  in  the  name  market,  the  method  of  paired 
observations  will  he  tincd*  The  data  ure  given  in  Table  6.2*  Calculation 
yields  75^CU4,  *£  -0.0018,  and  ^-(U)018/1K  Therefore,  a  95  per 
cent  confidence  interval  for  ^/>  is  specified  by 


_  0.14=F  (2.228)  (.OOlS/i  !}»'*£*  ($Q.tI,  $0.17). 

Although,  not  Illustrated  in  Kxample  6.5,  it  should  be  obvious  that  some 
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of  the  differences  could  be  negative.  This  is  so  because  the  differences 
are  defined  as  D  =  X—  Y,  not  as  D  =  \X—Y\.  Actually,  it  does  not  mat 
ter  whether  X—Y  or  Y  —  X  is  used  as  long  as  the  same  choice  is  used 
throughout  a  given  problem. 

Rather  than  burden  the  reader  with  excessive  repetition,  we  shall 
only  remind  him  that  one-sided  confidence  limits  are  also  possible.  All 
that  is  necessary  is  a  change  in  the  value  of  t  in  Equations  (6.19)  and 
(6.21),  or  a  change  in  the  value  of  z  in  Equation  (6.20). 


TABLE   6.2-Price  per  Box  of  Delicious  and  Mclntosh  Apples 


Market 

Delicious 

Mclntosh 

Difference 

1  

$2.  15 

$2   32 

$0   17 

2  

2.16 

2   34 

0.18 

3  

2.13 

2.30 

0.17 

4  

2.25 

2.40 

0.15 

5.  .    .     . 

2.20 

2   34 

0.14 

6  

2.18 

2.20 

0.02 

7  

2.27 

2.42 

0.15 

8    .  . 

2.21 

2,36 

0.15 

9  

2.23 

2.36 

0.13 

10  

2.16 

2.30 

0.14 

11    .                . 

2.20 

2.34 

0.  14 

6.10      CONFIDENCE    INTERVAL   FOR  THE    RATIO  OF  THE 
VARIANCES  OF  TWO   NORMAL   POPULATIONS 

The  problem,  of  estimating  the  ratio  of  two  population  variances  (or 
standard  deviations)  is  also  frequently  encountered.  If  the  two  popu 
lations  are  normal,  the  /^-distribution  may  be  used  to  provide  the  de 
sired  confidence  intervals.  The  procedure  is  to  calculate 


and 


U 


=(f) 
-(3) 


r-l.na— 1), 


(6.22) 


(6.23) 


and  these  limits  define  a  100-y  per  cent  confidence  interval  for 
Should  only  aix  upper  (or  lower)  limit  be  desired,  it  can  easily  be  found 
by  using  Fy  in  Equation  (6.23)  or  FI_T  in  Equation  (6.22).  One  other 
useful  result  is  the  following:  If  only  an  abbreviated  F-table  is  avail 
able  (e.g.,  one  that  contains  only  the  upper  percentage  points),  the 
identity 
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-^("i^s)  —  ~  -  (6.24) 

&  U-~?OO2,*'l) 

permits  the  calculation  of  F-values  at  the  left-hand  tail  of  the  distri 
bution. 

Example  6.6 

Using  the  data  given  in  Example  6.4,  99  per  cent  confidence  limits 
for  oi/V?  are  found  to  be: 


L  =  (65.99/109.63)  (0.416)  =  0.25 
U  =  (65.99/109.63)  (2.52)  =  1.52. 

If  a  confidence  interval  for  the  ratio  of  the  standard  deviations  of 
two  normal  populations  is  desired  (that  is,  if  we  wish  to  estimate 
cra/o-x),  it  is  appropriate  to  calculate  Z/=vX  and  tf'===vT7  where  L 
and  U  arc  defined  by  Equations  (G.22)  and  (6.23). 

6-11      TOLERANCE    LIMITS:   GENERAL   DISCUSSION 

One  common  method  used  by  engineers  to  specify  the  q  uality  of 
manufactured  product  is  the  method  of  tolerance  limits.  When  such 
limits  are  quoted,  it  is  expected  that  a  certain  percentage  of  the  product 
will  have  a  quality  between  the  stated  limits.  For  example,,  suppose 
electrical  gaps  are  judged  by  the  characteristic;,  "  transfer  time."  It  is 
then  desirable  to  be  able  to  quote  two  limits,  A  and  /?,  such  that  we 
are  fairly  certain  that,  say,  98  per  cent  of  all  gaps  produced  will  exhibit 
transfer  times  between  A  and  1$>  Such  limits  (dearly  provide  us  with  a 
measure  of  the  quality  of  the  product  under  consideration.  For  certain 
weapon  applications,  it  is  convenient  to  be  able  to  set  a  one-sided  toler 
ance  limit.  An  example  of  such  a  case  is  the  following:  90  per  cent  of  all 
Type  XYZ  batteries  will  yield  an  activated  life  of  at  least  200  seconds. 
In  general,  then,  tolerance  limits  are  limits  within  which  we  are  highly 
confident  will  lie  a  certain  percentage  of  the  individuals  of  a  statistical 
population. 

To  apply  tolerance  limits  in  u  satisfactory  manner,  certain  conditions 
must  be  met.  In  summary,  the  conditions  upon  which  tolerance  limits 
are  based  are  the  following: 

(1)  All   assignable   causes  of  variability   must   be   detected   and 

eliminated  BO  that  the   remaining   variability  may  be   con 
sidered  random. 

(2)  Certain  assumptions  must  be  made  concerning  the  nature  of 
the  statistical  population  tinder  study* 

6.12     TOLERANCE   LIMITS  (TWO-SIDED;  ONE-SIDED) 
FOR  NORMAL  POPULATIONS 

Tolerance  limits  considered  in  this  section  are  based  on  the  assump 
tion  that  the  parent  population  may  bo  described  by  a  normal  dintri- 


6.12       TOLERANCE    LIMITS    FOR    NORMAL    POPULATIONS  99 

but  ion.  If  the  true  mean  and  standard  deviation  are  known,  tolerance 
limits  are  formed  by  adding  to  and  subtracting  from  the  mean  some 
multiple  of  the  standard  deviation.  That  is,  if  &  and  o-  are  known,  toler 
ance  limits  take  the  form  jj,  ±  zo-  where  z  is  selected  from  Appendix  3 
and  depends  only  on  the  proportion  of  the  population  to  be  included 
within  the  calculated  limits.  For  example,  the  limits  M±  1.645<r  include 
90  per  cent  of  a  normal  population  with  mean  JJL  and  standard  devia 
tion  cr.  One-sided  tolerance  limits  may,  of  course,  be  obtained  by  con 
sidering  n+za  or  JUL—ZO-  as  the  problem  requires.  

In  a  practical  situation,  ^  and  cr  are  unknown.  Only  estimates,  ~X 
and  s,  are  available.  While  it  was  true  that  the  limits  /z±1.645<r  will 
include  90  per  cent  of  the  population,  the  same  statement  cannot  be 
made  concerning  J^it  1.645s.  Just  what  proportion  of  the  population 
will  lie  between  X  ±  Ks  depends  on  how  closely  "X  and  s  estimate  M  and 
cr.  Note  that  K  is  used  here  to  represent  the  constant  used  with  'X  and 
5  in  contrast  with  the  z  used  with  ^  and  cr. 

Since  X  and  s,  and  hence  ~X±Ks}  are  random  variables,  it  is  im 
possible  to  state  with  certainty  that  j£  +  Ks  will  always  contain  a 
specified  proportion,  P,  of  the  population.  That  is,  it  is  impossible  to 
choose  K  so  that  the  calculated  limits  will  always  contain  a  specified 
proportion,  P,  of  the  population.  However,  it  is  possible  to  determine 
K  so  that  in  many  random  samples  from  a  normal  population  a  certain 
fraction  y  of  the  intervals  Qc±.Ks)  will  contain  100P  per  cent  or  more 
of  the  population.  When  this  notation  is  used,  P  is  referred  to  as  the 
coverage  and  y  as  the  confidence  coefficient.  This  terminology  is  used 
since  we  are  lOOy  per  cent  confident  that  the  tolerance  range  specified 
by  j£±Ks  will  include  at  least  100P  per  cent  of  the  normal  population 
sampled.  

Intuitively,  it  is  reasonable  to  expect  that  values  of  K  used  with  3T 
and  s  will  be  larger  than  values  of  z  used  with  M  and  <r.  It  is_also  clear 
that  if  K  is  taken  large  enough,  then  the  probability  that  X  ±  Ks  will 
contain  at  least  100P  per  cent  of  the  population  may  be  made  very 
close  to  1.  However,  the  smaller  K  is  taken,  the  more  meaningful  and 
useful  the  tolerance  range  becomes.  The  engineer  is  thus  faced  with  a 
decision:  make  broad  statements  with  little  risk  of  error  or  make  pre 
cise  statements  (i.e.,  a  narrow  tolerance  range)  with  greater  risk  of 
error.  The  problem,  statistically  speaking,  becomes  that  of  finding  the 
smallest  value  of  K  consistent  with  a  specified  confidence  coefficient  y, 
proportion  P,  and  sample  size  n. 

We  must  not  forget  that  one-sided  tolerance  limits  are  frequently 
more  appropriate  than  two-sided  tolerance  limits.  That  is,  it  is  often 
desirable  to  specify  a  single  limit  such  that  a  given  percentage  of  the 
population  will  be  less  than  (or  greater  than)  this  limit.  Such  a  limit 
is  known  as  a  one-sided  tolerance  limit  and  is  usually  of  the  form 
~X+Ks  (or  IX  —  Ks"),  Both  one-sided  and  two-sided  tolerance  limits 
for  normal  populations  will  be  discussed  in  the  following  paragraphs. 

Table  6,3  is  an  abbreviated  table  of  K  factors  for  two-sided  tol- 


too 
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erance  intervals.  Values  of  K  taken  from  this  table  give  a  95  per 
cent  confidence  that  at  least  a  fraction  P  will  be  included  in  the 
interval  X  +  Ks.  Table  6.4  is  an  abbreviated  table  of  K  factors  for 
one-sided  tolerance  intervals.  Values  of  K  taken  from  this  table  give 
a  95  percent  confidence  that  at  least  a  fraction  P  will  be  above  (below) 
2T—  K$(~X  +  Ks).  Much  more  extensive  tables  can  be  found  in  Bowker 
and  Lieberman  (2),  Eisenhart,  Hastay,  and  Wallis  (21),  Owen  (19), 
and  Weissberg  and  Beatty  (23). 

Example  6.7 

Using  the  data  of  Example  6.2,  find  tolerance  limits  such  that  you 
arc  95  per  cent  confident  of  including  at  least  99  per  cent  of  the  sampled 
population.  These  limits  are  given  by 

3T  ±  Ks  «  2206  ±  5.775(2)  ~  (2194.45,  2217.55). 

TABLE  6.3-Two-Sided  Tolerance  Factors 

(Factors  K  such  that  the  probability  is  0.95  that  at  least  a  proportion  P 
of  the  distribution  will  be  included  between  l?±Ks  where  3T  and  $  are 
computed  from  a  sample  of  size  n.} 


P 

Yl 

0,7500 

0.9000 

0  .  9500 

0.9900 

5  

3.002 

4.275 

5.079 

6.634 

6  

2,604 

3.712 

4.414 

5.775 

7  

2.361 

3  .  369 

4.007 

5  .  248 

8  

2.197 

3.136 

3.732 

4.891 

9  

2  ,  078 

2,967 

3  ,  532 

4.631 

10  

1.987 

2  ,  836 

3,379 

4.433 

17  

1.679 

2.400 

2.858 

3,754 

37  

1  .  450 

2,073 

2.470 

3  ,  246 

145  

1  .280 

1  .829 

2.179 

2.864 

oo  ,  

1  .150 

1  .  645 

1,960 

2.576 

The  foregoing  discussion  of  tolerance  limits  and  the  K  factors  given 
in  Tables  6.3  and  6.4  depend  squarely  on  the  assumption  of  a  random 
sample  from  a  normal  population.  If  tolerance  limits  are  calculated 
using  these  tables  when  the  sampled  population  is  definitely  non- 
normal,  considerable  error  is  possible. 

6-13      D1STRIBUTION-FREE  TOLERANCE   LIMITS 

Sometimes  it  is  desirable  to  Bet  tolerance  limitn  that  do  not  depend 
on  the  assumption  of  normality.  That  is?  we  recognize  that  it  in  riot 
always  possible  to  justify  the  assumption  of  a  normal  distribution.  If 
we  are  dealing  with  a  statistical  variable  that  can  be  described  by  a 
continuous  distribution,  one  very  simple  set  of  dwtribution^fr^a  toler 
ance  limits  is  specified  by  Xmin  and  -Sfmax,  the  smallest  and  largest 
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TABLE  6.4-One-Sided  Tolerance  Factors 

(Factors  K  such  that  the  probability  is  0.95  that  at  least  a  proportion  P 

of  the  distribution  will  lie  above  (below)  ~X  —  Ks(X+Ks)  where  X  and 

s  are  computed  from  a  sample  of  size  n.} 


P 

0.7500 

0  .  9000 

0.9500 

0.9900 

5  

2.150 

3.412 

4.212 

5.751 

6  

1.895 

3.008 

3.711 

5.065 

7  

1.733 

2.756 

3.400 

4.644 

8  

1.618 

2.582 

3.188 

4.356 

9     .... 

1  .532 

2.454 

3.032 

4.144 

10  

1.465 

2.355 

2.911 

3.981 

17  

1,220 

2.002 

2.486 

3.414 

37 

1  .014 

1.717 

2.149 

2.972 

145  

0.834 

1.481 

1.874 

2.617 

CO    .... 

0.674 

1.282 

1.645 

2.326 

values  in  a  random  sample  of  size  n.  Clearly,  the  confidence  in  such 
limits  will  depend  on  n.  Persons  interested  in  reading  further  on 
this  topic  are  referred  to  Murphy  (16),  Ostle  (17),  Owen  (18),  and 
Wilks  (24). 

Problems 

6.1  As  a  physicist  or  chemist,  you  would  soon  become  acquainted  with 
such  "constants*'  as  Planck's  constant  and  Euler's  constant.  To  con 
sider  a  specific  case,  Planck's  constant  is  defined  as  "the  quantum  of 
energy  radiated  from  black  bodies  -5- frequency  of  radiation.33  Suppose 
you  were  attempting  to  find  the  value  of  this  constant  by  experi 
mental  methods.  You  ran  6  experiments  and  obtained  the  following 
estimates  of  h  (Planck's  constant)  : 

6.53  X  10~27 

6.54  X  10~27 
6.58  X  10~27 
6.56  X  10~27 

6.55  X  10~27 
6.55  X  10-27 

What  inferences  can  you  make  about  the  true  value  of  h?  Be  careful 
to  state  explicitly  any  assumptions  you  make. 

6.2  Given  that  n  =  9,  7  =  20,  and 

JC  y*  =  288, 

calculate  a  95  per  cent  confidence  interval  for  ju-  on  the  assumption 
that  you  have  a  random  sample  from  a  normal  population.  Interpret 
this  confidence  interval. 
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6.3  From  a  random  sample   of   100  aptitude  test  scores   drawn  from  a 
normal  population,  the  95  per  cent  confidence  interval  for  ju  is  calcu 
lated  to  be  (45,  55).  Fifty  other  random  samples,  each  of  size  100,  are 
drawn  from  the  same   population,   but   only   10   of  their   means   fall 
within  the  above  limits.  Is  it  not  correct  to  expect  95  per  cent  of  such 
sample  means  to  be  between  45  and  55?  Explain  your  answer. 

6.4  A  random  sample  of  25  observations  from  a  normal  population  had 
a  mean  of  20  and  a  sum  of  squares  of  the  deviations  from  the  mean  of 
2400,  Compute  and  interpret  the  90  per  cent  confidence  interval  for 
the  population  mean. 

6.5  It   has   been   reasonably   well  established   that   a   particular   machine 
produces  nails  whose  length  is  a  random  variable  with  a  normal  dis 
tribution-  A  random  sample  of  5  nails  yields  the  following  results: 

1 . 14  inches 

1 . 15  inches 
1 . 14  inches 
1.12  inches 
1.10  inches 

Calculate  99  per  cent  confidence  limits  for  /*. 

6.6  The  density  of  each  of  27  explosive  primers  was  determined,  with  the 
sample  average  being  1 .53  and  the  wampic  standard  deviation  being 
0.04,  [Determine  a  90  per  cent  upper  confidence  limit  for  /*. 

6.7  The   firing   of    101    rockets  yielded   an   average   range    (i.e.,    distance 
flown)  of  3000  yards  and  a  standard  deviation  of  40  yards.  Determine 
an  85  per  cent  lower  confidence  limit  for  M. 

6*8  Using  the  data  of  Problem  0,1,  compute  a  05  per  cent  confidence 
interval  for  or2. 

6.9  Using  the  data  of  Problem  4.11,  compute  a  90  per  cent  confidence 
interval  for  <r. 

6.10  If  in  a  sample  of  14   holts  the  estimate  of  the   population  standard 
deviation  of  their  lengths  was  #  —  ,,021,  what  are  the  OS  per  cent  con 
fidence  HmitR  for  the  standard  deviation  of  the  population  (<r)?  What 
assximptionB  nmnt  he  made  to  determine  these  limits? 

6.11  Using;  the  data  of  Prohlem  6.10,  determine  a  00  per  cent  upper  con 
fidence  limit  for  <r* 

0.12  Uninpr  the  data  of  Problem  0.5,  determine  a  07,5  per  cent  upper  confi 
dence  limit  for  cr. 

6A3  Using  the  data  of  Problem  0.0,  determine  a  00  per  cent  upper  confi 
dence  limit  for  <r, 

6.14  In   1054  the   mean  earnings  of  68   physicians  in   communities   from 
10,000  to  25,000  was  $13,944,  with  #«$40ia.   Find  the  00  per  cent 
confidence  limits  for  the  population  standard  deviation.  State  your 
assumptions* 

6.15  In  a  random  natnple  of  400  farm  operators*  05  per  cent  were  owners 
arid  35  per  cent  were  mmownern.   Determine  95  per  cent  confidence 
limits  for  the  true  percentage  of  farm  owners  in  the  population  of 
operators?  sampled. 

0,10     In  a  random  sample  of  600  light  bulbs,   12  were  defective.  Determine 

a  95  per  cent  upper  confidence  limit  for  the  true  fraction  defective. 
6.17     Uning  the  data  of  Problem  4.1,  and   the   results   found   in    Prohlem 
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4.10,  determine:  (a)  95  per  cent  confidence  limits  for  /x  and  (b)  80  per 
cent  confidence  limits  for  a2.  State  all  assumptions. 

6.18  Using  the  data  of  Problem  4.2  and  the  results  found  in  Problem  4.10, 
determine:  (a)  a  99  per  cent  upper  confidence  limit  for  M  and  (b)  a  95 
per  cent  upper  confidence  limit  for  <r.  State  all  assumptions. 

6.19  Using  the  data  of  Problem.  4.3  and   the   results    found   in    Problem 
4.10,  determine  50  per  cent  confidence  limits  for  each  of  the  means 
and  50  per  cent  upper  confidence  limits  for  each  of  the  standard  devia 
tions.  State  all  assumptions. 

6.20  You  are  engaged  as  a  testing  engineer  in  an  electrical  manufacturing 
plant.  One  of  the  products  being  produced  is  an  electric  fuse,  and  the 
most  important  characteristic  of  this  fuse  is  the  length  of  time  before 
it  "blows"  when  subjected  to  a  specified  load.  A  testing  program  was 
undertaken  and  the  following  sample  data  (in  seconds)  were  obtained. 


Day  1  Day  2 


42 

69 

45 

109 

68 

113 

72 

118 

90 

153 

Place  a  90  per  cent  confidence  interval  on  the  true  difference  between 
the  means  of  the  two  different  days'  productions.  Assume  that  each 
day's  production  may  be  represented  by  a  normal  population.  State 
all  other  assumptions  which  you  make  and  interpret  your  numerical 
answer. 

6.21  Given  that 

7i  -  75,     »!  «  9,     £  ylt  -  1482,      F2  «  60,     n*  -  16,       £  ytj  =  1830, 
i-i  j-i 

and  assuming  that  the  2  samples  were  randomly  selected  from  2 
normal  populations  in  which  erf  =  erf,  calculate  an  80  per  cent  confi 
dence  interval  for  JLAI-— jus- 

6.22  Two  barley  varieties  have  been  grown  at  a  number  of  locations  over 
several  years  in  an  area  and  their  general  adaptability  is  under  dis 
cussion.  Which  variety  would  you  select  for  the  area  on  the  basis  of  the 
following  yields  in  bushels  per  acre? 

Trebi— 41.2,  19.3,  45.5,  63.9,  63.8,  44.2,  42.5,  53.0. 
Svanota— 39.4,  30.8,  44.5,  51.5,  41.1,  26.5,  35.7. 

Place  confidence  limits  on  the  difference  between  the  means. 

6.23  Two  varieties  of  tomato   were  experimented  with  concerning  their 
fruit-producing  abilities.  The  study  was  done  in  a  greenhouse  and, 
because  of  extreme  variations  (among  locations  within  greenhouses) 
of  temperature,  light  quality,  and  light  intensity,  the  experimental 
plants  were  placed  in  pairs  (one  of  each  variety)  at  several  locations. 
The  following  data  were  obtained: 
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WEIGHTS  or  RIFE  FRXTITS  FOR  Two  VARIETIES  OF  TOMATO 

(in  pounds) 


Location 

Variety 

Difference  — 

A 

B 

A  —  B  =  D 

1  

3  .  03 
3.10 
2.35 
3.86 
3.91 
2.65 
1.72 
2.30 
2.70 
3  ,  60 

2.28 
2.68 
2.17 
3.56 
3  .  73 
1.48 
1.85 
1.86 
2.76 
2.68 

.75 
.42 
.18 
.30 
.18 
1.17 
—     .13 
.44 
—     .06 
.92 

2  

3 

4  

5  

6.  

7  

8  

9  

10.  .     .... 

Total 

29.22 

25.05 

4.17 

0.24 
6.25 


6.26 
6.27 


6.28 


6*29 

6.30 


6,31 


Determine  90  per  cent  confidence  limits  for  the  true  difference  between 
the  expected  weights  of  the  two  varieties.  State  all  assumptions. 
ITwing  the  data  of  Problem  6.20,  obtain  95  per  cent  confidence  limits 
for  orj/crl. 

Using  the  data  of  Problem  6.21   and  ignoring  the  assumption  used 
there,  namely,  that  cr?«»cri?  obtain  99  per  cent  confidence  limits  for 


Using  the  data  of  Problem  6,22,  obtain  95  per  cent  confidence  limits 

for  <rs/<rT. 

Using  the  data  of  Problem  6.2,  determine  with  95  per  cent  confi 

dence:  (a)  95  per  cent  tolerance  limits  and  (b)  an  upper  99  per  cent 

tolerance  limit. 

Using  the  data  of  Problem  6,4,  determine  with  95  per  cent  confidence: 

(a)  76  per  cent  tolerance  limits  and  (b)  a  lower  90  per  cent  tolerance 

limit* 

Using  the  data  of  Problem  6.6,  determine,   with  95  per  cent  confi 

dence,  a  99  per  cent  upper  tolerance  limit  on  the  densities. 

Using  the  data  of  Problem  6*7,  determine,   with  95  per  cent  confi 

dence,  a  90  per  cent  lower  tolerance  limit  on  the  ranges, 

Consider  tho  following  definitions: 

(a)  If  the  expected  value  of  an  estimator  does  not  equal  the  true  value 
being  estimated,  tho  difference  between  tho  expected  value  and 
the  true  value  ia  known  aa  the  bias  of  the  estimator, 

(6)    If  an  estimator  has  0  bias,  it  IB  Baid  to  bo  accunde* 

(c)  If  an  estimator  has  a  small  bias,  it  is  Baid  to  bo  relatively  accurate. 

(d)  If  an  estimator  has  a  large  bias,  it  i&  said  to  bo  inaccurate* 

($)  The  precision  of  an  estimator  is  a  measure  of  the  repeatability  of 
the  estimator.  Therefore,  precision  may  be  expressed  in  terms  of 
the  variance  of  an  estimator,  with  a  large  variance  signifying  lack 
of  precision  and  a  small  variance  signifying  high  prtuttaton*  Obvi 
ously,  absolute  precision  implies  a  0  variance,  an  ideal  seldom  (if 
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ever)  achieved.  (JNTOTE:  Sometimes  a  measure  of  precision  is 
referred  to  as  a  measure  of  reliability.  Because  the  word  "relia 
bility"  has  another  meaning  in  engineering,  this  is  unfortunate. 
However,  as  with  many  expressions,  the  phrase  is  now  a  part  of 
the  language  of  statistics  and  will  therefore  continue  to  be  used.) 

It  should  be  observed  that  an  estimator  may  be:  (1)  both  precise  and 
accurate,  (2)  neither  precise  nor  accurate,  (3)  precise  but  not  accu 
rate,  or  (4)  accurate  but  not  precise. 

(a)    Discuss  the  foregoing  concepts  and  definitions  relative  to  the  con 
tents  of  Section  6.1. 

(6)    Discuss  these  ideas  taking  cognizance  of  costs  and  other  economic 
and  physical  limitations  which  continually  plague  the  researcher, 
(c)    Discuss  the  accuracy  and  precision  of  the  various  estimators  that 
have  been  introduced  so  far  in  this  text. 
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C  H  APTE  R    7 

STATISTICAL  INFERENCE: 
TESTING  HYPOTHESES 

7.1      GENERAL  CONSIDERATIONS 

A  HYPOTHESIS  is  defined  by  Webster  as  "a  tentative  theory  or  supposi 
tion  provisionally  adopted  to  explain  certain  facts  and  to  guide  in  the 
investigation  of  others."  A  statistical  hypothesis  is  a  statement  about 
a  statistical  population  and  usually  is  a  statement  about  the  values  of 
one  or  more  parameters  of  the  population.  For  example,  the  following 
could  be  taken  as  hypotheses:  (1)  the  probability  of  a  1  on  a  toss  of  a 
certain  die  is  f ,  (2)  the  mean  height  of  American  adult  males  is  5  feet 
8.4  inches,  (3)  the  mean  length  of  a  certain  brand  of  6-inch  rulers  is 
5.99  inches  and  the  standard  deviation  is  0.02  inch. 

It  is  frequently  desirable  to  test  the  validity  of  such  hypotheses.  In 
order  to  do  this,  an  experiment  is  conducted  and  the  hypothesis  is 
rejected  if  the  results  obtained  from  the  experiment  are  improbable 
under  this  hypothesis.  If  the  results  are  not  improbable,  the  hypothesis 
is  accepted.  For  example,  we  might  test  hypothesis  (1)  above  by  toss- 
ing  the  die  600  times.  Intuitively,  it  is  evident  that  if  600  1's  are  ob 
tained,  the  result  is  improbable  under  the  hypothesized  probability  of 
^,  and  the  hypothesis  should  be  rejected.  On  the  other  hand,  if  100  1's 
were  observed,  this  result  would  not  be  improbable  and  the  hypothesis 
would  undoubtedly  be  accepted.  When  results  such  as  these  are  ob 
tained,  intuition  (combined  with  common  sense)  is  sufficient  to  decide 
whether  to  accept  the  hypothesis.  However,  in  actual  practice,  experi- 
meiital  results  do  not  usually  lead"lu  au^li  obvi0'usi?5ilt5tn^iong;  hence 
the  leraedTlfo^^  It  shoulcTBeT  pointed  out 

that  although  we  accept  or  rejec^aT^ypothesIs^we^have  not  proved  or 
disproved  the  hypothesis. 

In  testing  hypotheses,  there  are  two  types  of  errors  which  can  be 
made.  These  are  called: 

Type  I  error — the  rejection  of  a  hypothesis  which  is  true. 

Type  II  error — the  acceptance  of  a  hypothesis  which  is  false. 
To  aid  the  reader  in  comprehending  the  nature  of  statistical  hypotheses, 
decisions,  and  the  various  types  of  error,  Table  7.1  has  been  found 
helpful. 

When  setting  up  an  experiment  to  test  a  hypothesis,  it  is  desirable 
to  minimize  the  probabilities  of  making  these  errors.  In  order  to  make 
it  easier  to  talk  about  these  errors  and  their  probabilities,  the  proba 
bility  of  making  a  Type  I  error  is  designated  as  a.  and  the  probability 
of  making  a  Type  II  error  is  designated  as  /S.  It  should  also  be  noted 
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that  100 a  (in  per  cent)  is  commonly  referred  to  as  the  significance  level. 
What  constitutes  suitably  small  values  of  a  and  £?  This  is  not  a  ques 
tion  which  can  be  answered  unequivocally  for  all  situations.  Obviously 
the  values  of  a  and  /3  should  depend  on  the  consequences  of  making 
Type  I  and  II  errors,  respectively.  For  example,  if  we  are  considering 
the  purchase  of  a  lot  of  batteries  (or  some  other  very  critical  item)  for 
use  in  weapons,  we  might  hypothesize  that  the  lot  is  of  satisfactory 
quality.  (Actually  we  should  state  this  hypothesis  in  more  precise 
terms.)  If  this  hypothesis  is  true  and  we  reject  it,  no  great  harm  has 
been  done  since  we  can  always  wait  for  the  next  lot  (assuming  that  \ve 
are  not  in  a  hurry).  Consequently  a.  can  be  relatively  large  (perhaps 
0.25  or  larger).  Oil  the  other  hand,  if  the  hypothesis  is  false  and  we 
accept  it,  the  result  may  be  a  large  number  of  dud  weapons.  Since  this 
is  very  undesirable,  0  should  be  quite  small,  (maybe  0.01  or  leas).  It 
should  be  pointed  out  that  the  supplier  might  feel  differently  about 
these  probabilities* 

TABLE  7.1-Definition  of  the  Types  of  Errors  Associated  With  Tests 

of  Hypotheses 


Decision 

True  Situation 

Hypothesis  is  true 

Hypothesis 

is  false 
error 

Accept  the  hypothesis  

No  error 
Type  I  error 

Type  II 
No  error 

Reject  the  hypothesis*  ,  

An  important  consideration  in  discussing  the  probabilities  of  Type 
II  errors  is  the  "degree  of  falseness"  of  a  false  hypothesis.  In  a  given 
experiment,  if  the  hypothesis  is  false  but  is  nearly  true  (such  as  hypoth 
esising  that  a  probability  is  J  when  actually  it  is  1.0001/2),  ft  could  be 
quite  large.  However,  if  the  hypothesis  is  grossly  false,  (such  as  hy 
pothesizing  that  a  probability  is  f  when  it  is  actually  1),  /9  should  be 
much  smaller.  For  a  given  experiment  testing  a  specific  hypothesis, 
the  value  of  1  —  /3  is  known  as  the  power  of  the  test.  Since  the  power 
depends  on  the  difference  between  the  value  of  the  parameter  specified 
by  the  hypothesis  and  the  actual  value  of  the  parameter  where  the 
latter  is  unknown>  1—/9  should  be  expressed  as  a  function  of  the  true 
parameter.  Such  a  function  is  known  as  a  power  function  and  is  ex 
pressed  as  1  — /8(0)  where  6  represents  the  true  parameter  value.  The 
complementary  function,  /3(0),  is  known  as  the  operating  characteristic 
(OC)  function. 

Before  proceeding  further  with  the  details  of  testing  hypotheses,  a 
few  more  remarks  of  a  general  nature  are  in  order-  It  is  good  practice 
not  only  to  state  the  hypothesis  to  be  tested  (denoted  by  //)  but  also 
to  state  the  alternative(s)  to  //  (denoted  by  A).  This  ia  not  only  good 
procedure;  it  also  aids  in  the  determination  of  the  regions  of  acceptance 
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and  rejection  when  considering  the  sample  space  of  all  possible  values  of 
the  test  statistic.  Incidentally,  the  rejection  region  is  frequently  referred 
to  as  the  critical  region.  Using  the  notation  of  this  paragraph,  it  is  seen 
that  <x  =  P  (reject  H\H  is  true)  and  @  =  P  (accept  H\  A  is  true). 

Example  7.1 

Consider  a  simple  hypothesis,  H:IJ,=/JLQJ  against  a  single  alternative* 
A:fjL=fjLij  where  we  are  dealing  with  a  normal  population  with  known 
variance,  cr2.  Let  the  decision  to  reject  H  (accept  A)  or  to  accept  H 
(reject  A)  be  based  on  a  single  observation  obtained  at  random  from  the 
population  under  examination.  If  the  random  observation  is  less  than 
C  (see  Fig.  7.1),  H  will  be  accepted;  if  the  random,  observation  is  greater 
than  or  equal  to  (7,  H  will  be  rejected.  That  is,  X^C  constitutes  the 
rejection  or  critical  region.  The  probabilities  a.  and  /3  are  represented  by 
the  shaded  and  cross-hatched  areas,  respectively.  Clearly,  besides  de 
pending  on  the  choice  of  C,  a  depends  on  the  hypothesis  under  test 
(frequently  called  the  null  hypothesis)  while  /3  depends  both  on  the  null 
hypothesis  and  on  the  alternative  hypothesis. 


DISTRIBUTION 
ASSUMING  I-T 
IS  TRUE 


DISTRIBUTION 
ASSUMING  A. 
IS  TRUE 


ACCEPT   J± 


REJECT    ±L- 


FIG.   7. 1  —  Graphical   illusfraHon   of  the  acceptance  and 
rejection    regions   in    Example   7.1. 

Example  7.2 

Modify  Example  7.1  to  the  following  extent:  Consider  H:JJL=JJLO 
versus  the  composite  alternative  A  :/z  >MO-  In  this  situation  a  is  the 
same  as  before  but  j3  is  now  better  denoted  by  /3(/i)  =P  (accept  H\JJL). 
Clearly  /3(/x)  changes  as  we  think  of  the  "alternative  distribution"  in 
Figure  7.1  taking  all  possible  positions  for  which  ni  >/xo-  Thus,  an  OC 
curve  similar  to  the  one  shown  in  Figure  7.2  is  generated. 

Example  7.3 

Consider  a  further  modification  of  Example  7.1,  namely,  £T:/*  =  Mo 
versus  the  alternative  Arpt^Ma.  The  acceptance  and  rejection  regions 
might  be  as  shown  in  Figure  7.3,  namely,  reject  if  3C<Ci=Mo  — -ka  or 
if  -Xr>C2==Mo  +  &<r  and  accept  if  Ci  <X  <C2.  Only  the  distribution  of 
the  test  statistic  under  H  is  shown.  The  distribution  under  A  may  be 
visualized  if  the  reader  thinks  of  sliding  the  distribution  shown  to  the 
left  and  to  the  right.  For  this  situation,  an  OC  curve  similar  to  the  one 
in  Figure  7.4  would  result. 


FIG.  7.2— Type  of  OC  curve  to  be  expected   in   situations 
similar  to  Example  7.2, 
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FIG.   7,3— Graphical  illustration   of  the  acceptance 
and  rejection  regions  in  Example  7,3, 


FIG.  7-4— Type  of  OC  curve  to  be  expected  in  situations 
similar  to  Example  7.3* 
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7.2      ESTABLISHMENT  OF  TEST   PROCEDURES 

When  establishing  a  test  procedure  to  investigate,  statistically,  the 
credibility  of  a  stated  hypothesis,  there  are  several  factors  that  must 
be  considered.  Assuming  a  clear  statement  of  the  problem  has  been 
formulated  and  that  an  associated  hypothesis  has  been  stated  in  mathe 
matical  terms,  these  are : 

(1)  The  nature  of  the  experiment  that  will  produce  the  data  must 
be  defined. 

(2)  The  test  statistic  must  be  selected.  That  is,  the  method  of  ana 
lyzing  the  data  should  be  specified. 

(3)  The  nature  of  the  critical  region  must  be  established. 

(4)  The  size  of  the  critical  region  (that  is,  ot)  must  be  chosen. 

(5)  A  value  should  be  assigned  to  0(8)  for  at  least  one  value  of  0 
other  than  the  value  of  8  specified  by  H.  This  is  equivalent  to 
stating  what  difference  between  the  hypothesized  value  of  the 
parameter  and  the  true  value  of  the  parameter  must  be  de 
tectable,  and  with  what  probability  we  must  be  confident  of 
detecting  it. 

(6)  The  size  of  the  sample  (i.e.,  the  number  of  times  the  experi 
ment  will  be  performed)  must  be  determined. 

It  should  be  clear  that  these  steps  will  not  always  be  taken  in  the  order 
listed.  Not  all  of  the  steps  are  independent,  and  frequently  it  is  neces 
sary  to  reconsider  (several  times)  the  various  steps  until  a  reasonable 
test  procedure  is  formulated.  More  will  be  said  on  this  subject  later. 
For  now,  some  explanatory  examples  will  probably  be  of  more  value 
than  additional  generalizations. 


Example  7.4 

With  respect  to  a  specific  coin  we  have  H:P  (heads)  =p  =  0.5  and 
A  :p5*=0,5.  The  experiment  will  consist  of  tossing  the  coin  some  number 
of  times,  counting  the  number  of  times  heads  occurs,  and  rejection  of 
//  will  take  place  if  either  a  very  small  or  very  large  proportion  of  heads 
are  observed.  Let  a.  =  0.05.  Ignore  /?(£>)  for  the  moment.  Consider  n  =  5 
and  the  rejection  region  to  consist  of  either  no  heads  or  all  heads.  Then, 
P  (rejection | p=0. 5)  =tSr  =  0.0625.  Since  this  is  greater  than  ct  =  0.05, 
a  larger  number  of  tosses  is  required.  Let  us  try  n  =  6,  keeping  the  same 
rejection  region.  Now  we  have  P  (rejection  |p  =  0.5)  =  0.03125  which  is 
less  than  a.  =  0.05.  Thus  an  acceptable  test  procedure  has  been  devel 
oped.  (NOTE:  The  probabilities  of  rejection  given  were,  of  course,  cal 
culated  using  f(x)  =  C(n, 


Example  7.5 

In  Example  7.4  we  derived  the  test:  "Toss  the  coin  six  times  and 
reject  //  if  either  zero  or  six  heads  occurs;  otherwise,  accept  H."  Clearly, 
other  rejection  regions  might  have  been  chosen  together  with  different 
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values  of  n,  as  long  as  P  (rejection  \p  =  0.5)  <  ex.  What  we  found  in 
Example  7.4  was  the  smallest  value  of  n  for  the  specified  rejection 
region.  The  reader  should  investigate  some  of  these  other  possibilities. 

Example  7.6 

Now  consider  @(p)  for  the  test  derived  in  Example  7.4.  For  selected 
values  of  p  the  approximate  values  of  /3(p)  =  l—  p6  —  (1—  p)6  are  given 
in  Table  7.2.  (NOTE:  Only  approximate  values  are  given  because  the 
exact  answers  involve  an  unnecessary  number  of  decimal  places.  For 
example,  for  p  =  0.5, 

1  -P  (rejection  |  p  =  0.5)  =  1  -0.03125  =  0.96875^0.97.) 


If  one  did  not  consider  the  derived  test  to  be  discriminating  enough  (as 
evidenced  by  the  OC  curve),  the  discriminatory  power  could  be  in 
creased  by:  (1)  changing  the  sample  size  and  the  definition  of  the  crit 
ical  region  or  (2)  concocting  an  entirely  different  test  procedure  and  test 
statistic.  It  is  clear  that  we  are  faced  with  just  this  situation  in  the  pres 
ent  case.  The  test  derived  in  Example  7.4  is  good  for  detecting  two- 
headed  or  two-tailed  coins  (nearly  as  good  as  looking  at  both  sides  of 
the  coin)  but  is  poor  for  detecting  slightly,  or  even  moderately,  biased 
coins.  Thus  a  modified  or  new  test  is  required. 

TABLE  7.2-Selected  Values  of  the  OC  Function  for  Example  7.6 


P 

Approximate  Values  of 

£(p) 

0 

0 

0.1 

0.47 

0.2 

0.74 

0.3 

0.88 

0.4 

0.95 

0.5 

0.97 

0,6 

0.95 

0.7 

0.88 

0.8 

0.74 

0.9 

0.47 

1,0 

0 

Example  7.7 

Consider  the  following  modification  of  Example  7.4,  namely, 
H  :p>0.5  and  A  :p  <0.5.  The  experiment  will  remain  the  same  but  the 
regions  of  acceptance  and  rejection  will  change*  Obviously,  the  occur 
rence  of  many  heads  does  not  tend  to  deny  //,  so  the  rejection  region 
will  be  only  that  region  in  which  few  heads  occur.  That  is,  a  one-tailed 
test  (like  a  one-sided  confidence  limit)  is  required.  Proceeding  as  before, 
it  is  found  that  a  possible  test  is:  "Toss  the  coxa  five  times.  If  no  heads 
occur,  reject  //;  otherwise,  accept."  This  gives  <**» 0.031 25. 
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7.3      NORMAL  POPULATION;  H:M~Mo  VERSUS 

Suppose  that  we  wish  to  know  if  a  random  sample  could  be  from  a 
normal  population  with  mean  ^o-  More  specifically,  assuming  normal 
ity,  the  hypothesis  ^J:M  =  Mo  wiu  be  tested  relative  to  the  alternative 
A  IM^/XQ.  For  a  chosen  a9  the  procedure  is  to  compute 


t  =  (X  -  MO)  As  =  Vn(X  -  MO)  A  (7.1) 

and  reject  H  if  t<  —  «a-.«/2>cw-i)  or  if  <>«a--«/2Xn--i>;  otherwise,  accept  H. 
Example  7.8 

A  metallurgist  made  four  determinations  of  the  melting  point  of  man 
ganese:  1269°,  1271°,  1263°,  and  1265°C.  Are  these  in  accord  with  a 
hypothesized  value  of  1260°C?  Here  the  hypothesis  is  H:M«1260,  the 
alternative  is  A  :ju^l260,  and 

/  =  ^  """  Mo  =  1267  —  1260  __ 

Jjf       "~          1.862          ~~  3"83 

is  computed.  Since  £0.375(3)  ==3.182  (a  5  per  cent  significance  level  is 
assumed),  H  is  rejected  and  it  is  concluded  that  the  hypothesized  value 
is  incorrect.  By  using  a  5  per  cent  significance  level,  it  is  recognized  that 
the  probability  of  Type  I  error  will  be  no  greater  than  0.05.  That  is, 
there  is  a  maximum  risk  of  5  per  cent  in  rejecting  the  hypothesis  that 
M  —  1260  if  the  hypothesis  is  really  true. 

It  would  also  be  of  interest  to  examine  the  OC  curve  for  the  test  pro 
posed  above.  However,  it  would  be  necessary  to  prepare  OC  curves  for 
many  values  of  <x  and  n.  Also,  the  formula  for  $  associated  with 
"Student's"  fr-test  involves  the  noncentral  it-distribution  which  must 
be  considered  beyond  the  scope  of  this  text.  Thus,  the  reader  is  referred 
to  examples  of  such  curves  given  in  Bowker  and  Lieberman  (3). 

7.4      NORMAL   POPULATION;  JET:,z</*o  VERSUS  A:M>/x0,  OR 

H:M>A*O  VERSUS  A:jm<fjL* 

A  more  common  situation  than  that  considered  in  Section  7.3  is  the 
case  of  a  one-sided  alternative.  For  example,  a  manufacturer  produces 
wire  cable  which  must  have  a  breaking  strength  not  less  than  1500 
pounds.  A  new  and  cheaper  process  for  making  the  cable  is  discovered 
and  he  wishes  to  change  to  the  new  process,  provided  that  cable  so  pro 
duced  will  have  a  mean  breaking  strength  greater  than  1500  pounds. 
Thus  he  could  formulate  the  hypothesis  Hi  !*,<}**  =  1500  pounds  as 
opposed  to  the  alternative  ,A:/x>1500  pounds.  The  hypothesis  H 
would  be  rejected  if  a  sample  of  the  new  cable  presented  sufficient  evi 
dence  that  M  actually  exceeds  1500  pounds. 

In  general  terms,  when  testing  the  hypothesis  H:^<^G  versus 
the  procedure  is  to  calculate 
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*  =   (3s  -  MO)  A*  =  Vn(X  -  MO)  A  (7.2) 


and  reject  H  if  2><(i—  «xn—i);  otherwise,  accept  H. 

If  the  hypothesis  J£T:M>MQ  versus  A:M<MO  is  under  investigation, 
the  test  statistic  is  calculated  as  in  Equation  (7.2),  but  now  H  is 
rejected  only  if  t<  —  £(!_«)  CTI-_I). 

Example  7.9 

A  manufacturer  of  television  sots  purchases  tubes  from  one  of  the 
few  large  suppliers  of  such  specialized  material.  He  will  not  purchase 
tubes,  however,  unless  it  can  be  demonstrated  that  the  average  length 
of  life  will  exceed  500  hours.  A  random  sample  of  9  tubes  is  subjected  to 
a  "life  test"  and  the  following  values  are  obtained  :  T  =*  600  and  s2  —  2500. 
It  is  assumed  that  the  "lengths  of  life"  (measured  in  hours)  are  normally 
distributed.  Shall  the  hypothesis  //:^<500  be  accepted?  For  this  ex 
ample,  *=  (GOO  —  500)/16.67  =  6,00  far  exceeds  £0.95(8)  =1.860,  and  the 
null  hypothesis  is  rejected.  As  can  be  seen,  a  5  per  cent  significance  level 
was  used.  This  means  that  the  maximum  risk  of  rejecting  //:/z<500 
when  //  is  really  true  is  5  per  cent.  Therefore,  the  manufacturer  of  tele 
vision  sets  will  undoubtedly  purchase  tubes  from  this  supplier. 

The  reader  is  again  referred  to  Bowker  and  Lieberman  (3)  for  sample 
OC  curves  related  to  these  te«t  procedures, 
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Suppose  that  we  have  a  sample  of  sixe  n  drawn  randomly  from  a 
normal  population  and  some  predetermined  value  of  the  variance  is  to 
be  substantiated  or  refuted;  i.e.,  we  wish  to  test  the  hypothesis, 
//:<r2  =  <r§  as  opposed  to  the  alternative  A  :<r2  ^<TQ.  If  a  probability,  <x, 
of  making  a  Type  I  error  has  been  chosen,  i.e.,  a  significance  level  of 
100  <*  per  cent  has  been  selected,  the  hypothesis  //  will  be  accepted  if 

xV/.Xn-l)    <    Z)    (X    -    DVorJ    <    XV^C^'  <7-3) 

Otherwise,  //  will  be  rejected, 

Example  7*10 

Consider  the  data  given  in  Example  7.8*  Do  these  values  mipport  the 
hypothesis  that,  if  repeated  measurements  arc  assumed  to  be  normally 
distributed,  the  true  variance  of  all  such  measurements  IB  equal  to  2? 
Here  the  hypothesis  in  //:<72»2  and  the  alternative  IB  A  :cr2p^2.  It 
IB  determined  that  S(-V  —  3T)*M-20.  Since  xS.onw>  -0.216  and 
Xo.««»>  ""9-#5»  we  Bee.  that  //  is  rejected.  You  will  note  that  once  again  a 
probability  of  Type  I  error  equal  to  0.05  was  ehcmen.  That  is,  we  have 
run  a  maximum  risk  of  5  per  cent  of  rejecting  a  true  hypothesis, 

Jf  wo  winh  to  determine  the  CO  csurve  for  this  teat,  the  required 
values  can  bo  ealcsulated  using 

»     »  i     <  X2  <   (^o/o-^X 
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The  reader  is  referred  to  Bowker  and  Lieberman  (3)  for  examples  of 
OC  curves  associated  with  this  test. 


POPULATION; 
VERSUS   A:o 


H:a*<<r0  VERSUS  A:<r*>o%, 


7.6      NORMAL 

or  H*r*>o 

It  is  usually  more  realistic  to  consider  the  hypothesis  that  the  popu 
lation  variance  is  less  than  or  equal  to  some  particular  value  than  to 
consider  the  hypothesis  that  it  equals  some  value.  This  is  so  because, 
in  general,  a  small  variance  is  considered  to  be  desirable.  In  such  a  case, 
the  hypothesis  H:<T*<CTQ  is  formulated  as  opposed  to  A:cr2>cr|.  The 
hypothesis  H  will  be  rejected  only  if  %2=  XX-3T  —  3f)  2/cro>x2(i-<*)(n_i)- 

Should  jfif:cr2>cr§  (as  opposed  to  A:<r2<j7p)  be  under  investigation, 
the  rejection  region  would  be  x2==  T^  (X  —  3T)  VcrS  <  y£  ,«._,  }  . 

Sample  OC  curves  may  be  observed  in  Bowker  and  Lieberman  (3) 
for  a:  =  0.05  and  <x  =  0.01. 

Example  7.11 

Consider  the  data  of  Table  7.3  which  were  obtained  from  a  random 
sample  of  80  bearings.  To  test  (using  a.  =  0.05)  H  :  cr2  <  0.00005  versus 
A  :  or2  >  0.00005,  we  calculate 

X2  =  ]T  *2/CK00005  ^  0.000474/0.00005  =  9.48. 
Since  this  does  not  exceed  xJ96C79)  =  100.7,  we  are  unable  to  reject  H. 


TABLE   7.3-Number  of  Bearings  Observed  With  the  Indicated  Diameters 


Diameter  (Inches) 

Number 

3.573 

4 

3.574 

2 

3.575 

9 

3.576 

12 

3.577 

12 

3.578 

10 

3.579 

9 

3.580 

9 

3.581 

8 

3.582 

4 

3.583 

1 

Total 

80 

7.7      BINOMIAL  POPULATION;  H:p=pQ  VERSUS 

The  coin-tossing  experiment  discussed  in  Example  7.4  illustrates  the 
type  of  problem  to  be  considered  in  this  section.  However,  in  practice, 
one  is  usually  given  the  sample  size  and  asked  to  determine  the  rejec 
tion  region,  rather  than  (as  in  Example  7.4)  being  asked  to  find  the 
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smallest  sample  size  that  is  consistent  with  a  specified  rejection  region. 
For  example,  for  a  fixed  sample  size,  n,  the  acceptance  and  rejection 
regions  are  determined  by  solving 


C(»,  *)po(l  -  Po)"""  =  «/2  (7.4) 

x«0 

and 

C(w,  *)pS(l  -  po)-""  =  a/2  (7.5) 


for  Z/  and  Z7.  The  acceptance  region  defined  by  these  two  equations  is 
the  set  of  positive  integers  between,  but  not  including,  L  and  C/. 

Unfortunately,  it  is  usually  impossible  to  find  integral  values  of  L 
and  U  to  satisfy  Equations  (7.4)  and  (7,5).  Therefore,  it  is  customary 
to  choose  those  values  of  L  and  U  which  make  the  value  of  each  of  the 
summations  as  large  as  possible  without  exceeding  a/2.  Occasionally, 
the  restriction  of  being  less  than  or  equal  to  a/2  will  be  relaxed  if,  by 
so  doing,  the  probability  of  rejecting  a  true  hypothesis  will  be  only 
slightly  larger  than  the  chosen  a. 

Example  7.12 

In  a  certain  cross  of  two  varieties  of  peas,  genetic  theory  led  the  in 
vestigator  to  expect  one-half  of  the  seecLs  produced  to  be  wrinkled  and 
the  remaining  one-half  to  be  smooth.  Taking  of«0.01  and  n««4()l  deter 
mine  L  and  C7,  and  thus  define  the  acceptance  and  rejection  regions, 
Using  Equations  (7,4)  and  (7,5)  with  po»«0.5y  we  obtain  L**l\  and 
f/aa29.  Therefore,  the  acceptance  region  consists  of  those  values  of  x 
for  which  11  <x  <29. 


Without  adequate  tables,  the  procedure  discussed  BO  far  in  this  sec 
tion  i«  not  very  palatable  to  the  researcher.  Consequently,  some  ap 
proximate  procedures  which  lend  themselves*  to  easy  calculation  will 
bo  investigated. 

In  Section  5.11  the  normal  distribution  wan  HUggented  as  a  posnible 
approximation  to  the  binomial  distribution.  If  such  an  approximation 
is  used,  Equations  (7,4)  and  (7.5)  are  replaced  by 


ssss.-^  >    «  a/2  (7.6) 

—  PO)    * 

and 

(  (U  —  0.5)  —  npo) 

P<Z  >   • — •==s^--==r=™.>    =  «/2  C7.7) 

^      ~        '       "        po)  f  /  ^        ; 


where  Z  is  a  standard  normal  variate-  These  equations  may  then  be 
solved  for  L  and  t/* 
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Example  7.13 

Using  the  normal  approximation,   find  L  and    17  for  the  situation 
described  in  Example  7.12.  Since  a.  =  0.01,  we  see  that 

{  (L  +  0.5)  -  40(0.5)  }A/40(0.5) (0.5)  -  -  2.575 
and 


{(£/  -  0.5)  —  40  (0.5)}  A/40  (0.5)(0.5)  =  2.575. 
Thus,  L  =  11.  4  ^11  and  E7  =  28.6^29. 

Rather  than  proceed  as  indicated  in  the  preceding  paragraph,  it  is 
common  to  take  the  number  of  events  (x)  occurring  in  the  class  associ 
ated  with  p  and  calculate 

(oc  +  0.5)  —  np0  ..      . 

Z  =         —    J          "    ,          torx<npQ  (7.8) 

V^po(l  —  p)o 

or 

(x  -  0.5)  -  npQ 
Z  =  -  -  «  for  oc  >  npQ.  (7.9) 


Then,  the  hypothesis  H:p  =  pQ  will  be  rejected  if  Z^Zo./z  or  if  Z>2i_a/2 
where  za  and  ^i_«/2  are  found  in  Appendix  3. 

Example  7.14 

Consider  the  situation  described  in  Example  7.12.  A  random  sample 
of  40  seeds  segregated  into  30  wrinkled  and  10  smooth.  Using  Equation 
(7.9),  _ 

Z  =  {(30  -  0.05)  -  40(0.5)  }/V40(0.5)  (0.5)  ^3  >  2.995  =  2.58. 

Therefore,  H:p=*Q.5  is  rejected. 

Another  useful  approximation  is  available  because  the  square  of  a 
standard  normal  variate  is  distributed  as  chi-square  with  one  degree  of 
freedom  (see  Section  5.18).  When  the  chi-square  approximation  is  used, 
the  test  statistic  is 


t—  1 
where 

0i  =  x 

0%  =  n  —  x 


The  hypothesis  H:p  =  p<>  will  be  rejected  if  x2>x?_*a:);  otherwise, 
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will  be  accepted.  It  should  be  clear  that  O  stands  for  observed  and  that 
E  stands  for  expected  in  Equation  (7.10). 

Example  7.15 

It  will  be  instructive  to  rework  Example  7.14  using  this  method. 
Thus, 

xs  «  (|  30  _  20  [    -  0.5)  a/20  +(  |  10  -  20  |    -  0.5)  2/20  =  9.025  >  X299(1)  «  6.63. 

Therefore,  as  in  the  preceding  example,  //:p==0.5  is  rejected.  (NOTE: 
X2  =  9.025  ==J£2^(3)*.) 


7.8      BINOMIAL.    POPULATION;    H~p<pQ    VERSUS 
OR  H:p>pQ  VERSUS  Aip<p* 

In  many  practical  situations,  the  hypothesis  H:p<p^  is  more  ap 
propriate  than  //:  p~po*  An  example  of  this  would  be  any  hypothesis 
concerning  the  per  cent  of  defective  items  in  a  production  lot.  When 
dealing  with  this  type  of  problem,,  the  researcher  may  use  only  the 
exact  procedure  or  the  normal  approximation,  The  chi-squarc  approxi 
mation  may  not  be  used  because  it  effectively  adds  together  the  areas 
under  both  tails  of  the  standard  normal  curve  when  only  a  one-tailed 
test  is  appropriate. 

Only  the  case  If:p^po  versus  yl:p>po  will  be  discussed  in  detail. 
The  discussion  for  the  case  H:p*>pQ  versus  A  :p<.p^  would  proceed  in 
a  similar  fashion,  the  only  change  being  which  tail  of  the  distribution 
is  used  for  the  rejection  region.  The  value  of  U  which  defines  the  ac 
ceptance  and  rejection  regions  is  determined  by  solving 

C(n,  *)po(l  -  po)w—  ==  <*  (7,11) 


for  U,  As  before,  it  will  be  necessary  to  settle  for  that  value  of  U  such 
that  the  value  of  the  summation  closely  approximates  a.  The  rejection 
region,  then,  consists  of  the  positive  integers  greater  than  or  equal  to 
U*  If  the  normal  approximation,  is  used,  calculate 

Z  «  {  O  -  0,5)  -  npoJ/VnpiC^Vo)-  (7-12) 


Example  7*16 

Prom  past  experience  it  has  been  determined  that  a  qualified  operator 
on  a  certain  machine  turning  out  400  items  per  day  produces  20  or 
fewer  defective  items  per  day,  A  new  operator  in  hired  to  run  the  same 
machine  and  the  hypothesis  is  made  that  he  IB  a  qualified  operator, 
Taking  ^»M).03,  determine  f/,  and  thus  define  the  acceptance  and 
rejection  regions.  Here,  the  hypothesis  is  //  :p  <0.05  and  n**400,  Using 
Kquatkm  (7*11),  wo  find  that  t/«29.  Thus,  if  the  new  operator  pro 
duced  more  than  28  defective  items  in  a  run  of  400,  we  would  reject  the 
hypothesis  that  ho  is  a  qualified  operator* 

Example  7.17 

Using    the    normal   approximation,    test   the    hypothesis    //:p;<0*05 
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versus   A:p>Q.Q5,   given   that   £=  32/400  =  0.08.    Let   <x  =  0.03.    Using 
Equation  (7.12), 

Z  =  {(32  -  0.5)  -  400(0.05)  }  A/400(0.05)  (0.95)  s*  2.6  >  Zt97  =  1.89. 
Thus,  #:?  <0.05  is  rejected. 

7.9     TWO  NORMAL  POPULATIONS;  H:^^^  VERSUS 


The  methods  to  be  described  here  are  closely  allied  with  those  dis 
cussed  when  obtaining  confidence  limits  for  MI  —  ^2.  Consequently,  it  is 
recommended  that  the  reader  review  the  earlier  material.  Without 
further  preamble,  we  shall  present  and  illustrate  the  appropriate  pro 
cedures. 

Case  I:  <rf  =  of 

In  this  case  the  procedure  is  to  calculate 

/  -  (3*1  -  X2)A^_^2  (7.13) 

where 


and 

x  +  w2  ~  2) 
-  2),  (7.15) 


and  to  reject  H:^^^  if  *<  —  «a«a/2)cn1H-n2~2)   or  if  _a 

Clearly,  some  simplification  of  the  formulas  will  occur  if  ni  =  n2. 

Example  7.18 

Wire  cable  is  being  manufactured  by  two  processes.  We  wish  to 
determine  if  the  processes  are  having  different  effects  on  the  mean 
breaking  strength  of  the  cable.  Laboratory  tests  were  performed  by 
putting  samples  of  cable  under  tension  and  recording  the  load  required 
to  break  the  cable.  Using  the  data  given  in  Table  7.4,  and  letting 
CK  =  0,05,  test  the  hypothesis  H:/jLi=fj,%  versus  A  :^ix^2.  Calculations 
yield 

!Fi  «  8.17,         ^2  «  11.29,          s2  =  5.29,     and     t  =  —  2.44. 
Since  <.  975(11)  =  2.201,  the  hypothesis  H  is  rejected. 

Example  7.19 

Two  rations  (feeds)  are  to  be  compared  with  respect  to  their  effect 
on  the  weight  gains  of  hogs.  Ten  animals  are  available,  and  five  are  fed 
feed  No.  1  while  the  other  five  are  fed  feed  No.  2.  Using  <2  =  0.10;  test 
the  hypothesis  that  the  two  feeds  are  equally  effective  in  causing  hogs 
to  gain  weight.  The  data  obtained  are  given  in  Table  7.5.  Calculations 
yield  t=  —1.58.  Since  £.95(8)  =  1.860,  we  are  unable  to  reject  -ff:/zj=£rj. 
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TABLE  7.4-Critical  Values  of  the  Load  (Coded  Data) 


Process  No.  1 

Process  No.  2 

9 

14 

4 

9 

10 

13 

7 

12 

9 

13 

10 

8 

10 

TABLE  7.5-Gains  in  Weight  (in  Lbs.) 

Feed  No.  1 

Feed  No.  2 

1 

4 

2 

3 

4 

9 

5 

10 

8 

9 

Case  II:  <rf 

When  this  situation  prevails,  that  is,  when  we  are  unwilling  to  as 
sume  that  erf  equals  a\,  a  reasonably  good  approximate  procedure  is 
as  follows.  Compute 


*'  -  (x,  - 


+ 


and  reject  if 
or  if 
where 


tf 


(7.17) 
(7.18) 


Example  7.20 

As  an  illustration,  Example  7  AS  will  be  reworked  on  the  assumption 
that  of  does  not  equal  cr|.  Thus,  Xi^S.17,  ^«11,29,  $f«=5.4, 
sl*«6.2,  w>lS«0.9,  toa«0.74,  *'-—  2.4,  <J«2.571,  ^a««  2.447,  and  the 
weighted  average  of  ti  and  ts  is  2.52.  Conclusion:  accept  H* 
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Case  III:  Paired  Observations 

The  procedure  in  this  case  is  to  calculate 

t  = 


=  p*   VERSUS    A:JJ* 
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j  (7-19) 

and  to  reject  H:fj.i  =  v>2  (or  Hfifj,D  =  AH  — M2  =  0)  if  t< — £<!-«/ 2)  c«~i>  or  if 
£>£(!-«/ 2)<n-i).  Here,  of  course,  n  is  the  number  of  pairs  of  observa 
tions.  Or,  in  other  words,  n  is  the  number  of  differences,  D~X—Y. 
Also,  it  should  be  clear  that  £>  =  X—  F. 

Example  7.21 

In  a  Brinell  hardness  test,  a  hardened  steel  ball  is  pressed  into  the 
material  being  tested  under  a  standard  load.  The  diameter  of  the  spher 
ical  indentation  is  then  measured.  Two  steel  balls  are  available  (one 
from  each  of  two  manufacturers)  and  their  performance  will  be  com 
pared  on  15  pieces  of  material.  Each  piece  of  material  will  be  tested 
twice,  once  with_each  ball.  The  data  obtained  are  given  in  Table  7.6.  Cal 
culations  yield  Z>  =  8,  s2^121.6,  and  £^2.81.  Using  c*  =  0.05,  it  is  seen 
that  2.81  >t. 975CU)  =* 2. 145,  and  thus  we  reject  the  hypothesis  that  the 
two  steel  balls  give  the  same  average  hardness  indication. 

TABLE   7.6-Data  Obtained  in  a  Brinell  Hardness  Test 


Sample  No. 

Diameters 

D  =  X—  Y 

X 

F 

1  

73 

43 
47 
53 
58 
47 
52 
38 
61 
56 
56 
34 
55 
65 
75 

51 
41 
43 
41 
47 
32 
24 
43 
53 
52 
57 
44 
57 
40 
68 

22 
2 
4 
12 
11 
15 
28 
—  5 
8 
4 
—  1 

—  10 
2 

25 

7 

2  

3  

4  

5  

6  

7  

8  

9  

10  

11  

12  

13  

14  

15  

As  has  been  stated  before,  the  OC  curves  associated  with  tests  of 
significance  must  be  examined  if  one  is  to  be  certain  that  the  suggested 
test  procedure  is  discriminating  enough.  Once  again  we  shall  beg  the 
question,  and  refer  the  reader  to  Bowker  and  Lieberman  (3)  for  samples 
of  such  curves. 
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7.10     TWO 


NORMAL 

OR  jHT:i 


POPULATIONS; 
VERSUS  A:/J 


<MZ     VERSUS 


By  now,  the  technique  for  one-tailed  test  procedures  should  be  clear. 
Consequently,  only  a  brief  discussion  will  be  given.  With  reference  to 
the  three  cases  discussed  in  the  preceding  section,  the  same  test  sta 
tistic  will  be  calculated  here  as  was  calculated  there.  The  only  differ 
ence  will  be  in  the  selection  of  the  critical  values  of  t  from  the  table  in 
Appendix  5.  As  in  other  examples  of  one-tailed  tests,  the  values  will 
be  chosen  so  that  all  of  a.  (rather  than  a/2)  will  be  at  one  end  of  the 
distribution.  Some  OC  curves  are  again  available  in  Bowkcr  and 
Liebcrman  (3).  The  tests  are  summarized  in  Table  7.7. 

TABLE  7.7-One-Sided  Test  Procedures  for  Comparing  the  Means  of  Two 

Normal  Populations 


Hy- 

po  thesis 

Assumption 

Statis 
tic 

Equa 
tion 

Rejection  Region 

Mi  ^Ma 

Cl-al 

/ 

7  A3 

t  >*<>-<*)  (m-Hn^-a) 

Mi  <M2 

CT?  7*01 

t' 

7.16 

t'  >  weighted  average  using 

100(1  —a)  per  cent  points 

M/><0 

paired  observations 

t 

7,19 

£>2<l.~-  <*)<n~  1) 

Mi  >M'2 

a?~c-8 

I 

7.13 

/<  —  J<l—orX»H  nj-.*) 

Mi  >Ma 

a*  *d 

t' 

7.16 

£'<the     negative     of     the 

weighted    average    using 

100(1  —a)  per  cent  points 

Ml>>0 

paired  observations 

t 

7.19 

£<  —~£(l~~«)(n^  l> 

Example  7.22 

Two  pieces  of  moat,  one  a  control  and  the  other  treated  to  tenderize 
the.  fibers,  are  to  be  tested.  Tenderness  will  be  measured  by  the  force 
needed  to  shear  samples  of  meat.  (Lower  shear  force  values  indicate 
more  tender  moat.)  trivon  the  data  in  Table  7,8,  and  letting  o:^ 0.025, 
test  the  hypothesis  7/:jur>^  versus  A : ^Tj<fjLa,  ^O^l^nltitlonB  yield 


TABLK  7.8-Shear  Force  Values  for  Tenderness  Test 


Control 
50 


44 
24 
50 
41 
43 


Treated 

46 
40 
32 
23 
54 
51 
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Since  t. 975(12)  =2.179,  we  reject  H  and  conclude  that  the  treatment  does 
improve  the  tenderness  of  the  meat. 

Further  examples  could  be  given.  However,  rather  than  take  up 
space  for  such  a  purpose,  we  will  rely  on  problems  to  illustrate  the 
other  cases. 

7.11      TWO      NORMAL      POPULATIONS;     flr:of=of     VERSUS 


As  in  Section  6.10,  the  F-ratio  will  be  the  appropriate  statistic.  That 
is,  the  procedure  will  be  to  calculate 

F  =  s\/  si  (7.20) 

)n^^  or  if  F>Fa 

larger  sample  variance 


and  reject  H  if  F<F^m(ni^)n^^  or  if  F>Fa_a/2><nr-i,nj^L>.  Alterna 
tively,  we  can  calculate 


(7,21) 
smaller  sample  variance 

and  reject  only  if  F  >  F  &—*/%)  ^iy  v$  where  z>i  and  z>2  represent,  respec 
tively,  the  degrees  of  freedom  associated  with  the  numerator  and  de 
nominator.  OC  curves  may  be  obtained  by  calculating 


/3    =    P{  (a-2/CTj),F(a/2)(ni—  I,n2—  1)    <    -P    <    (<^2/Vi)F  (1—  a/2)  (Wi—  I,n2—  1)  }  - 

Sample  OC  curves  are  given  in  Bowker  and  Lieberman  (3). 

Example  7.23 

Using  the  data  of  Example  6.4  and  letting  a:  =  0.05,  test  the  hypothesis 
J/:oi  =<r|  versus  A  10^7*0$.  It  is  seen  that  F  =  109.63/65.99  =  1.66 
with  jfi  —  40  and  v2  =  30  degrees  of  freedom.  Since  F.  975(40,30)  =2.01,  we 
are  unable  to  reject  H. 

7.12     TWO     NORMAL     POPULATIONS;     JHTiofrSof     VERSUS 
A:oi><ri,  OR  H:a%>02  VERSUS  A:oi<oi 

As  was  done  in  Section  7.10,  only  a  summary  of  the  test  procedures 
will  be  given.  This  appears  in  Table  7.9.  No  examples  will  be  given, 
but  some  of  the  problems  at  the  end  of  the  chapter  will  provide  an 
opportunity  to  apply  the  indicated  method.  As  in  other  sections,  solv 
ing  of  problems  is  strongly  recommended  as  an  aid  in  increasing  an 
understanding  of  the  various  methods. 

TABLE  7.9-One-Sided  Test  Procedures  for  Comparing 
The  Variances  of  Two  Normal  Populations 


Hypothesis 

Statistic 

Equation 

Rejection  Region 

3i3 

F 
F 

7.20 
7.20 

£l?r^ir" 
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Reference  is  again  made  to  Bowker  and  Lieberman  (3)  for  those  who 
wish  to  examine  OC  curves  associated  with  the  tests  of  this  section. 

7.13      MULTINOMIAL   DATA 

Many  times,  our  sample  elements  may  be  assigned  to  any  one  of  sev 
eral  different  classes,  or  categories,  rather  than  simply  to  one  or  the 
other  of  two  classes  as  in  Section  7.7.  In  such  a  situation  we  must  work 
with  the  multinomial  distribution  rather  than  the  binomial  distri 
bution. 

A  common  problem  is  to  test  the  hypothesis 

H:pi  =  pi0  (i  =  1>  2,  -  -  -  ,  K) 
where  there  are  k  classes.  Of  course, 

k  k 

^  Pi  =  52  P™  =  i- 

t~l  i«~l 

A  simple  test  procedure  is  available  by  means  of  the  ehi-square  approxi 
mation.  In  this  case,  the  degrees  of  freedom  equal  h—1,  that  is,  one 
less  than  the  number  of  classes  (or  parameters).  The  procedure  is  to 
calculate 

x2  -  Z  (0<  -  RWRi  (7.22) 

*•—  i 

where  O*  represents  the  number  observed  in  the  ith  cla^s  and  JKt^npiQ 
reresents  the  number  expected  in  the  ith  class  if  //  Ls  true.  Clearly, 
i  —  n*  Then,  if  X*>XU~~«)<A-~I»  ^e  hypothesis  //  is  rejected. 


Example  7.24 

In  a  particular  genetic  experiment,  the  observations  were  classified 
as  follows: 

Class  A—99 
Class  B<—  33 
Class  C—  24 
Class  D—  4 

but  genetic  theory  called  for  a  9:3:3:1  ratio.  Using  a  5  per  cent  mg~ 
nificanee   level,    do   the   data  support   the   theory?   Calculation   yields 

xi  „.  (99  -  90)  a/^>  +  (33  -  30)V3D  -f-  (24  -  3Q}*/30  +  (4  -  I0)»/10  -  6,0. 


This  is  less  than  xtascs)  "«7.81,  and  thus  we  are  unable  to  reject  the  hy 
pothesized  theory. 

7.14      PO1SSON    DATA 

There  are  several  processes  which  give  rise  to  observations  distri 
buted  according  to  the  Poisson  probability  function 

/(#)  «  <r*X*/#l;          a;  -  0,  1,  2,  •  -  -  .  (7,23) 
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Some  examples  are:  (1)  radioactive  disintegrations,  (2)  bomb  hits  on  a 
given  area,  (3)  chromosome  interchanges  in  cells,  and  (4)  flaws  in  ma 
terials. 

Obviously,  many  hypotheses  and  alternatives  could  be  considered 
and  discussed.  However,  for  purposes  of  illustrating  the  methods  of 
analysis,  only  two  will  be  examined. 

To  test  the  hypothesis  H:\<\Q  versus  A  : X > X0,  it  would  be  appropri 
ate  to  obtain,  for  a  sample  of  one, 

P  =  i  ~  p(p  _  i)  (7.24) 

where  F(x)  is  read  from  Appendix  2  under  the  assumption  X  =  X0.  If 
P<&,  the  hypothesis  H  would  be  rejected. 

Example  7.25 

A  random  sample  of  two  phonograph  records  shows  1  and  4  de 
fects  per  record,  respectively.  Assuming  a:  =  0.01,  test  the  hypothesis 
H:\  <0.5  versus  A  :X  >0.5.  (NOTE:  This  is  testing  the  hypothesis  that 
the  average  number  of  defects  per  record  is  less  than  or  equal  to  J.) 
Since  we  have  a  total  of  5  defects  from  2  records,  we  make  use  of  the 
fact  that  w~ xi+x$  also  follows  a  Poisson  distribution  with  parameter 
X'  =  nX  =  2A.  Consulting  Appendix  2  for  X' =  2A0  =  2(§)  =  1,  we  see  that 
F(w  — 1)=^(5  —  1)=JP(4:)  =0.996  and  thus  P  =  1  —F(w  —  l) i  =0.004. 
Since  this  is  less  than  oc  =  0.01,  the  hypothesis  H:\<0.5  is  rejected  in 
favor  of  the  alternative  A  :X  >0.5. 

The  second  situation  to  be  examined  is  of  interest  from  a  methodo 
logical  point  of  view  since  it  combines  the  assumption  of  a  Poisson  dis 
tribution  with  the  chi-square  method  of  analysis.  Essentially,  it  is  a 
comparison  of  several  Poisson  distributions  to  see  if  the  parameters 
(that  is,  the  X's)  differ  significantly.  The  procedure  is  best  illustrated 
by  an  example. 

Example  7.26 

Suppose  a  phonograph  record  manufacturing  company  is  investigat 
ing  5  different  production  processes.  Four  records  are  selected  at  ran 
dom  from  those  produced  by  each  process  and  the  number  of  defects 
per  record  is  counted.  The  data  are  given,  in  Table  7.10.  Chi-square  is 
then  computed  for  each  process,  using  the  observed  process  average  as 
the  expected  number  of  defects  per  record  for  that  process.  Each  of  these 
chi-squares  has  3  degrees  of  freedom.  Using  the  additive  property  of 
chi-square,  it  is  noted  that  the  total,  9.70,  has  15  degrees  of  freedom. 
It  can  be  verified  that  none  of  these  6  values  of  chi-square  is  significant 
at  the  1  per  cent  level.  Thus,  there  is  little  question  about  the  uni 
formity  of  records  produced  by  the  same  process.  However,  if  the  chi- 
square  representing  the  variation  among  processes  is  calculated,  that  is 

X2=,  [(64-35,2)2-4-(28-35.2)2-l-(32— 35.2)2+  (32-35.2)2+  (20— 35.2)2]/35.2  =  32.18, 

we  see  that  x2==::32.18  >x%o(4)  =  13.3.  Therefore,  the  hypothesis  of  no 
differences  among  processes  is  rejected.  It  might  be  concluded  that  some 
processes  (probably  numbers  II,  III,  IV,  and  V)  will  allow  production 
of  product  containing  fewer  defects. 
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TABLE  7.10-Number  of  Defects  per  Record  From  XYZ 
Manufacturing  Company 


Number  of  Defects 

Process 

Process 

^      (0     _  R.y 

X       fn         7                   -  •-"'  •-  -      -I,- 

Process 

per  Record 

Totals 

Means 

i              ^-^                   77 
j-1                  J&» 

I  

11,  16,  17  20 

64 

16 

42/16=2.62 

II 

5     7     5    11 

28 

7 

24/7    —3.43 

III  

11,    9,    7,    5 

32 

8 

20/8   =2.50 

IV  

8   10,    7     7 

32 

8 

6/8   =    .75 

V.  .  . 

5654 

20 

5 

2/5    «=    .40 

Total 

176 

9.70 

7.15      CHI-SQUARE  TEST  OF  GOODNESS  OF    FIT 

One  thing  that  is  often  done,  with  no  justification  other  than  saying 
it  appears  reasonable,  Is  to  assume  that  the  variate  under  discussion 
follows  a  particular  distribution.  For  example,  data  are  frequently 
assumed  to  be  samples  from  a  normal  population,  and  you  may  well 
question  this  assumption.  At  this  time,  one  procedure  useful  in  check 
ing  on  the  validity  of  such  assumptions  will  be  presented. 

The  procedure  is  to  make  a  comparison  between  the  actual  number 
of  observations  and  the  expected  number  of  observations  (expected 
under  the  "uwwiunption")  for  various  values  of  the  variate.  The  ex 
pected  numbers  are  usually  calculated  by  iivsing  the  assumed  distribxi- 
tion  with  the  parameters  set  equal  to  their  sample  estimates.  The  chi- 
sqxiare  statistic  will  be  calculated  according  to  Kquatiou  (7.22)  and  the 
degrees  of  freedom  will  be  /c— -p  — 1,  where  p  represents  the  number  of 
parameters  estimated  by  sample  statistics.  For  example,  if  a  normality 
assumption  were  xnuler  test,  &  and  a-2  would  be  estimated  by  5T  and  s*2, 
and  the  degrees  of  freedom  would  be  Ai  — 3,  where  k  represents  the  num 
ber  of  class  intervals  used  in  fitting  the  distribution.  If  the  assumption 
of  a  Poisson  distribution  were  being  tested,  X  — /x  would  be  estimated  by 
X)  and  the  degrees  of  freedom  would  be  A?  —  2. 

Rather  than  continue  the  discussion  in  general  terms,  an  example 
involving  the  Poisson  distribution  will  be  studied. 


Example  7.27 

The*  data  given  in  Table  7.11  show  the  number  of  "senders**  (a  type 
of  automatic  equipment  used  In  telephone,  exchanges)  that  were  in  xise 
at  a  given  instant.  Observations  were  made  on  3754  different  occasions. 
The  expected  numbers  were  calculated  from/(.r)  s*e,  xA*/x!  where  A  wan 
set  eqxial  to  3T«I()-44.  Hince  x2a«43.43  >X*gg(<w)  ••37.6,  the  hypothesis 
of  a  PoiBHon  diHtrihution  with  ju«*  10.44  is  rejected. 

One  point  to  bo  noted  in  Kxamplc  7.27  wan  the  combination  of  the 
entries  of  the  top  two  linen  of  the  table  to  form  a  ningle  elana.  Thin  was 
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TABLE   7.11-Number  of  Busy  Senders  in  a  Telephone  Exchange* 


Number 

Observed 
Frequency 

Expected 
Frequency 

Deviation 

(O-JS)2 

Busy 

(0) 

OB) 

(O-E) 

JE 

0  

o\ 

0   11\ 

-t-    3   74 

11    01 

1  

> 

Si 

1    15} 

2  

**  ) 

14 

5  98 

+   8  02 

10   76 

3.  .  . 

24 

20  82 

-1-   3    18 

4-Q 

4  

57 

54  33 

+   2   67 

i  ^ 

5  

111 

113  44 

—   2  44 

05 

6  

197 

197  38 

—   0  38 

00 

7.  .. 

278 

294  38 

—  16  38 

01 

8  

378 

384.16 

—    6  16 

10 

9  

418 

445  63 

—  27  63 

1    71 

10  

461 

465  24 

—   4  24 

03 

11  

433 

441  .  56 

—    8  56 

17 

12  

413 

384.  15 

+  28.85 

2   17 

13  

358 

308.50 

+49.50 

7  94 

14  

219 

230.05 

—  11.05 

.53 

15  

145 

160.11 

—  15.11 

1  43 

16  

109 

104.47 

+   4.53 

.20 

17  

57 

64.  16 

—    7.16 

.80 

18  

43 

37.21 

+   5.79 

.90 

19...           .     . 

16 

20.45 

—   4.45 

97 

20  

7 

10.67 

—    3.67 

1.26 

21  

8 

5.31 

+   5.69 

1.36 

22  

3 

4.51 

—    1.51 

.51 

Total 

3754 

3753.77 

+   0.23 

X2  =  43.43 

*  Source:  Thornton  C.  Fry,  Probability  and  Its  Engineering  Uses  (New  York:  D.  Van 
Nostrand  Company,  Inc.,  1928),  p.  295, 


done  because  the  expected  number  on  the  first  line  was  too  small.  The 
reason  for  avoiding  such  expected  numbers  is  that  they  lead  to  large 
chi-square  values  (perhaps  even  significant  values  of  chi-square)  which 
do  not  reflect  a  departure  of  "observed  from  expected"  but  only  the 
smallness  of  the  "expect ed."  In  other  words,  if  some  expected  numbers 
are  too  small,  the  chi-square  statistic  will  be  a  poor  indicator  of  the 
validity  of  the  hypothesis  under  test.  Some  authors  say  that  "too 
small"  means  less  than  3;  others  say  less  than  5.  Since  not  everyone  is 
agreed  on  the  interpretation  of  what  is  too  small,  you  should  feel  free 
to  use  any  reasonable  definition.  Personally,  I  favor  the  value  "3." 
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7.16      BINOMIAL  POPULATION;  MORE  THAN  ONE  SAMPLE 

A  situation  which  occurs  frequently  in  experimental  work  is  the  fol 
lowing:  A  hypothesis  is  to  be  tested  and  several  experiments  are  con 
ducted  to  produce  data  which  bear  on  the  problem.  When  this  situa 
tion  prevails,  it  is  natural  to  think  of  combining  the  experimental 
results.  For  example,  if  the  hypothesis  H :  p  =  po  is  being  tested  relative 
to  the  alternative  Aip^ps,  it  is  quite  common  to  have  available  k 
samples  (perhaps  of  different  sizes)  as  a  result  of  k  replications,  or 
repetitions,  of  the  basic  experiment. 

How  should  the  data  from  the  several  samples  be  combined?  There 
are  two  ways  this  can  be  done,  and  each  will  be  discussed  and  then  il 
lustrated  in  Example  7.28.  It  will  be  noted  that  the  analysis  performed 
involves  the  chi-square  distribution  and  depends  on  the  previously 
mentioned  additive  property  of  chi-square.  Actually,  several  chi-square 
values  are  calculated,  and  each  of  these  contributes  a  different  item  of 
information  relative  to  the  hypothesis  under  test. 

You  will  note  that  a  chi-square  value  (with  1  degree  of  freedom)  is 
found  for  each  sample.  Each  of  these  values  can  be  interpreted  as 
in  Section  7.7.  As  the  next  step  in  the  analysis,  wo  may  calculate 
x2===:Xi+X2  +  "  •  •  +  X&  with  k  degrees  of  freedom*  This  value  will  be 
referred  to  as  the  pooled  chi-square,  and  it  is  clearly  a  pooling  or  accum 
ulation  of  the  bits  of  evidence  provided  by  the  k  independent  samples. 
This  value  may  now  bo  used  to  assess  the  validity  of  the  hypothesis 
under  test.  An  alternative  way  of  pooling  the  information  from  several 
samples  is  to  lump  the  original  data  into  one  large  sample  arid  compute 
the  total  chi-square  (with  1  degree  of  freedom)  associated  with  this 
super  sample.  One  other  statistic  should  also  be  obtained,  namely,  the 
heterogeneity  chi-square.  This  quantity,  which  has  fc  —  1  degrees  of 
freedom,  is  found  by  subtracting  the  total  chi-square  from  the  pooled 
chi-square.  It  is  used  to  measure  the  lack  of  consistency  among  the 
several  samples. 

Example  7.28 

Consider  again,  the  hypothesis  tested  in  Example  7.12.  Now,  instead 
of  I  sample,  8  separate  experiments  give  rise  to  8  samples  as  shown  in 
Table  7.12.  Assuming  c*««0.01,  it  is  seen  that:  (1)  no  1  of  the  8  samples 
leads  to  rejection,  (2)  the  super  sample  of  1600  observations  yields 
X**»2.66  which  is  not  significant,  and  (3)  the  pooled  chi-square  is  sig 
nificant,  Why  do  we  get  these  seemingly  contradictory  results?  The 
pooled  chi-square  is  significant  because  we  have  accumulated  enough 
evidence  from  each  sample  to  indicate  that  the  hypothesis  //:p»«0,5 
should  be  rejected.  The  reason  the  total  chi-square  did  not  give  the 
same  answer  is  that  in  3  samples  smooth  seeds  predominated  while  in 
5  samples  wrinkled  seeds  predominated*  This  effect  was  hidden  (i.e., 
the  majorities  in  opposite  directions  tended  to  cancel  out)  when  the 
data  were  lumped  into  one  large  sample.  Attention  is  called  to  the 
previously  mentioned  lack  of  consistency  among  the  8  samples  by  the 
significant  heterogeneity  chi-square. 
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TABLE   7.12-Chi-Square  Analysis  Combining  Data  From  Several  Samples 

of  Smooth  and  Wrinkled  Peas 


Sample 
Number 

Sample 
Size 

Number 
Wrinkled 

Number 
Smooth. 

On-Square  (x*) 

d.f. 

1  

100 

60 

40 

4  00 

i 

2  

200 

108 

92 

1    28 

i 

3  

180 

80 

100 

2   22 

i 

4  

208 

118 

90 

3    77 

i 

5  

300 

165 

135 

3  00 

1 

6  

182 

106 

76 

4  Q4 

1 

7  

230 

105 

125 

1    7S 

i 

8  

200 

90 

110 

2  00 

1 

Pooled  x2-  -  - 

8 

22.94  =    y^  x2 

8 

Total 

1600 

832 

768 

•"•*"  •  7  •*              ^—-/    A.  . 
<-l         * 

2.56 

1 

Difference 

20  38 

7 

7.17      CONTINGENCY  TABLES 

Suppose  n  randomly  selected  items  are  classified  according  to  two 
different  criteria.  The  tabulation  of  the  results  could  be  presented  as 
in  Table  7.13,  where  O^  represents  the  number  of  items  belonging  to 

TABLE  7.13-An  rXc  Table 


Rows 


1. 

2. 


Columns 


02c 


Or 


the  «?")th  cell  of  the  rXc  table.  Such  data  can  be  used  to  test  the  hy 
pothesis  that  the  two  classifications,  represented  by  rows  and  columns, 
are  statistically  independent.  If  this  hypothesis  is  rejected,  the  two 
classifications  are  not  independent  and  we  say  there  is  some  interaction 
between  the  two  criteria  of  classification. 

The  exact  test  for  independence  is  difficult  to  apply.  However,  if  n, 
the  sample  size,  is  sufficiently  large,  a  reasonably  good  approximate 
procedure  is  to  calculate 


X2  = 


(7.25) 
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where 

Oij  =  observed  number  in  the  (i/)th  cell, 

=  expected  number  in  the  (f/)th  cell, 

^j  =  observed  number  in  the  ith  row,  and 


y— i 


=  observed  number  in  the/th  column. 


The  value  of  chi-square  given  by  Equation  (7.25)  has  *>=(?  —  l)(c  —  1) 
degrees  of  freedom.  If  xa^x*i--«>i:<r--i)Ce-i>]>  the  hypothesis  of  inde 
pendence  should  be  rejected. 

Example  7.29 

A  company  has  to  choose  among  throe  proposed  pension  plans.  One 
hypothesis  that  the  company  wislics  to  investigate  is:  Preference  for 
plans  is  independent  of  job  classification.  It  asks  the  opinion  of  a 
sample  of  the  employees  and  obtains  the  information  presented  in 
Table  7.14.  The  expected  numbers  for  each  ceil  arc  calculated  and 
appear  in  Table  7.15.  Calculation  then  yields  x2==ll  <X*99(6)  ~  10*8  so 
the  hypothesis  cannot  be  rejected.  Thus,  it  is  concluded  that  the  em 
ployees'  choices  of  pension  plans  are  quite  probably  independent  of  their 
job  classifications. 

TABLE  7,14-Classification  of  Employees  by  Job  and 
Pension  Plan  Preference 


Number  of 

Employees 

Favoring 

Classification 

Plan  A 

Plan  B 

Plan  C 

Total 

Factory  employees    ...*.... 

160 

30 

10 

200 

Clerical  employees  .  .  *  .,,.*. 

140 

40 

20 

200 

Foremen  and  supervisors  .... 

Executives      

80 
70 

10 
20 

10 
10 

100 
100 

Total 

450 

100 

SO 

600 

TABLE  7JS~~Expected  Number  of  Observations 


Classification 

Wan  A 

Plan  B 

Plan  C 

Factory  employees  *     ,  

150 

100/3 

50/3 

Clerical  employees  ,  

ISO 

100/3 

50/3 

Foremen  sine!  supervisors  

75 

100/6 

50/6 

Executives  ..*<,,  

75 

100/6 

50/6 

7.18       SPECIAL    APPROXIMATE    METHODS    FOR    2X2    TABLES 


131 


If  we  are  presented  with  an  A7'- way  contingency  table,  that  is,  one  in 
which  the  individual  elements  are  assigned  to  the  cells  of  the  table  by 
N  different  criteria,  the  hypothesis  of  mutual  independence  of  the  N 
criteria  may  be  tested  by  a  simple  extension  of  the  rules  formulated  for 
the  rXc  table.  As  usual,  we  shall  compute  the  sum  (over  all  cells)  of 
"(observed  —  expected) ^/expected,"  where  the  expected  value  in  any 
cell  is  given  by  the  product  of  the  marginal  (border)  totals  associated 
with  the  row,  column,  etc.,  in  which  the  cell  is  located  divided  by  n^"1. 
The  resulting  statistic  is  approximately  distributed  as  chi-square  with 
"(r  —  l)(c—l)  •  •  .  "  degrees  of  freedom,  where  there  are  r  rows,  c  col 
umns,  etc.,  in  the  YV-way  table.  Other  hypotheses  may  also  be  tested 
in  such  tables,  for  example,  see  Mood  (12),  but  we  shall  not  discuss 
these  at  this  time. 


7.18 


SPECIAL 
TABLES 


APPROXIMATE       METHODS       FOR       2X2 


If  the  contingency  table  consists  of  two  rows  and  two  columns,  as  in 
Table  7.16,  a  short-cut  method  of  computing  chi-square  is  available. 
The  appropriate  formula  is 


/&  (7.26) 

where  n  =  a+b  +  c+d  and  k=  (a+&)  (c+d)  (a+c)  (6  +  d).  This  will  give 
the  same  numerical  value  of  chi-square  that  would  be  obtained  if  the 
procedure  of  Section  7.17  were  followed.  It  should  be  clear  that  the 
chi-square  statistic  thus  obtained  will  have  only  1  degree  of  freedom. 

TABLE    7.16-A  2X2  Table 


Ai 

A* 

Total 

JBi  

d 

I 

a+b 

Bi  

c 

d 

c+d 

Total 

a+c 

b  +  d 

n 

As  in  Section  7.7,  a  correction  for  continuity  may  be  used  to  sharpen 
the  approximation.  This  is  accomplished  by  calculating 

=  «(  |  ad  —  be  |    —  n/2y/k 


X 


~  0.5) 


(7.27) 


It  must  be  remembered  that  this  correction  should  not  be  applied  to 
rXc  tables  in  which  r>2  and 


Example  7.30 

A  random  sample  of  250  men  and  250  women  were  polled  as  fo  their 
desires  concerning  the  ownership  of  television  sets.  The  data  in  Table 
7.17  resulted.  Calculation  by  either  method  yielded 


<?  =  13*  > 


L.99(l> 


6.63. 
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Thus,  the  hypothesis  that  desire  to  own  a  television  set  is  independent 
of  sex  is  rejected. 

TABLE   7.17-Results  of  Sample  Poll  on  Television  Ownership 


Classification 

Men 

Women 

Total 

Want  television  

80 

170 

250 

Don't  want  television 

120 

130 

250 

Total 

200 

300 

500 

7.19  THE  EXACT  METHOD  FOR  2X2  TABLES 

It  should  be  noted  that  an  alternative  way  of  looking  at  a  2X2  table 
is  to  consider  the  two  fractions,  pi  =  a/(a+6)  and  P2=c/(c-|-rf)7  as  esti 
mates  of  pi  and  p<2>  the  parameters  of  two  binomial  populations.  In  this 
frame  of  reference,  a  comparison  of  pi  and  p%  should  yield  evidence  rela 
tive  to  the  hypothesis  H:pi~p^  If  we  wish  to  test  fl:pi~pz  versus 
Aipi^pz,  we  may  use  the  approximate  method  of  the  preceding  sec 
tion  or,  if  we  choose,  an  equivalent  test  based  on  the  normal  approxi 
mation, 

In  this  case,  however,  the  exact  test  procedure  is  not  too  difficult  to 
apply,  especially  if  a  digital  computer  is  available.  Thus,  it  seems  ap 
propriate  to  indicate  the  nature  of  the  exact  method. 

It  can  be  shown  that  the  exact  probability  of  observing  pi  =  a/(a+b) 
and  p^^c/(c+d)  when  pi~pz  is 


+ 


I 


alblcldlnl 


(7.28) 


To  obtain  the  final  probability  to  be  used  in  assessing  the  validity  of 
//:y>i  =  p2»  it  is  necessary  to  add  to  Pi  the  probabilities  of  more  diver 
gent  fractions  than  those  observed.  Assuming;  Pi<.p%  (and  the  table 
can  always  be  arranged  to  make  this  so),  the  next  more  divergent  situa 
tion  would  be  the  one  in  which  a  and  d  are  each  decreased  by  xmity,  and 
6  and  c  are  each  increased  by  unity.  For  this  array,  we  calculate 


(7.29) 


The  cell  entries  are  again  changed,  following  the  same  rule  as  before, 
and  PS  is  calculated.  Continue  in  this  manner  until  Pa+\  Is  calculated. 
Then,  if 


(7.30) 


*'— I 


is  less  than  or  equal  to  ix,  the  hypothesis  H:pi  =  pz  should  be  rejected. 
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Example  7.31 

Robertson  (13)  reported  on  the  analysis  of  an  experiment  involving 
the  evaluation  of  a  silicon  dip  as  a  protection  for  vacuum  tubes.  The 
data  shown  in  Table  7.18  were  obtained.  Using  the  procedure  outlined 
above,  he  found  P  =  PX+P2+P3  =  .0957.  He  then  concluded  that  "the 
failure  rate  for  protected  tubes  is  fust  barely  significantly  less  than  that 
for  unprotected  tubes.77  Apparently  a  10  per  cent  significance  level  had 
been  decided  upon  prior  to  the  analysis. 

TABLE  7.18-Success-Failure  Results  From  an  Experiment 
on  690  Vacuum  Tubes* 


Failures 

Nonfailures 

Protected  

*2 

338 

Unprotected    .  .  . 

7. 

343 

/ 

*  Source:  W.  H.  Robertson,  "Programming  Fisher's  exact  method  of  comparing  two  per 
centages,"  Technometrics,  Vol.  2,  No.  1,  pp.  103-7,  Feb.,  1960. 

7.20     SEVERAL  NORMAL  POPULATIONS;  H:^=^2  =^ 

In  Section  7.9  a  &-test  was  proposed  for  testing  H:JJL^  =  ^  versus 
Aipir^fjiz  under  the  assumption  that  cri  =  cr|.  Now  we  wish  to  propose 
a  procedure  for  handling  the  situation  in  which  we  have  k  normal  pop 
ulations,  fc>2. 

Intuitively,  it  seems  reasonable  that  the  validity  of  the  hypothesis 
H :  MI  =  M2  =  -  -  •  =  &k  should  be  assessed  by  comparing  the  sample  esti 
mates  of  pii,  Hz,  •  •  •  ,  Mfc.  That  is,  it  is  to  be  expected  that  any  suggested 
test  procedure  will  involve  a  comparison  of  Fa,  F2,  •  •  •  ,  7*;.  (NOTE: 
The  choice  of  Y  rather  than  X  as  the  symbol  denoting  the  character 
istic  was  prompted  solely  by  the  desire  to  agree  with  symbolism  to  be 
used  in  certain  techniques  that  will  be  presented  later  in  the  book.) 

If  the  assumption  is  made  that  o-f  ==  of  =  -  -  -  =  a>,  that  is,  if 
homogeneous  variances  are  assumed,  the  appropriate  test  procedure  is 
to  calculate 


_  *— 1 

*•  =  — - (7.31) 

k          n<  y        fc 

y^  v  f  F--  —  T^2  /  ^T  (*  —  n 

--C-^  ^—r  v  ^  u  *  *J      /      Z-j  \™i         *•} 

where 

F*7  —  jth  observation  in  the  ith  group  (sample)  ;  i  =  1,  -  -  •  ,  k      (7.32) 

J  =  1,  -  -  -  ,  n* 
n*  =  number  of  observations  in  the  ith  group  (7 . 33) 
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ii/ni  =  mean  of  the  observations  in  the  ith  group  (7  . 34) 


Y  = 


Then,  if  F>Fci-. 


of  all  observations. 


ya) where  *>i  =  k  —  1  and 

k 

X-  -  i), 


z— 1 


(7.35) 


the  hypothesis  H  :  /*i  ==  pc2  =  -  -  •  =M^  would  be  rejected.  (NOTE:  The 
reader  may  easily  verify  that,  if  k  =  2y  the  procedure  outlined  in  this 
section  is  algebraically  equivalent  to  that  of  Section  7.9.) 

Before  presenting  an  example,  a  convenient  tabular  form  for  carry 
ing  out  the  specified  test  procedure  will  be  indicated.  Invoking  the 
identity 


km 

^  £ 


or,  in  abbreviated  form, 


G 


vv 


K/ 


(7.36) 


(7.37) 


the  necessary  calculations  are  conveniently  presented  as  in  Table  7.19. 
(NOTE :  Such  a  table  is  usually  referred  to  as  an  analysis  of  variance 
table.)  Actually,  the  labor  involved  in  calculating  the  various  sums  of 
squares  may  be  materially  reduced  if  we  use  the  following  algebraically 
equivalent  forms : 

TABLE  7,19-Tabular  Presentation  of  the  /Mest  for  the  Equality  of 

Means  of  k  Normal  Populations  Under  the  Assumption 

of  Homogeneous  Variances 


Source  of 
Variation 


Among  groxips .  ,  . 
Within  groups.  . .  . 

Total 


Degrees  of 
Freedom 

Sum  of 
Squares 

Mean  S^ 

1 

ib  (««  -  1) 

Gyv 

G™.  /***    /  /  1 
v^i/j//  t" 

V-fTw,/ 

k 

y*  y» 

*-d 

F-  Ratio 


G/W 
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Myy      = 


=  sum  of  the  squares  of  all  the  observations,  (7.38) 

/     & 

it  (7.39) 


and 


In  the  above  equations, 


(7.40) 
(7.41) 


"^-  =  total  of  the  observations  in  the  ^th  group,      (7.42) 


i  =  total  of  all  observations, 


(7.43) 


and 

k 


=  total  number  of  observations  in  all  the  groups  combined.    (7.44) 


Example  7.32 

Consider  the  data   of  Table  7.20.   Using  Equations   (7.38)    through 
(7.41),  the  results  shown  in  Table  7.21  were  obtained.  Since 

F  =  72  >  ^.99(3,16)  =  5.29, 


the  hypothesis  H  :/xi=/jt2  = 
cance  level. 


is  rejected  at  the  1  per  cent  signifi 


TABLE  7.20-Sample  Data  From  Four  Normal  Populations  To  Be  Used 

in  Example  7.32 


Groups 

1 

2           3 

4 

45 

35          34 

41 

46 

33          34 

41 

49 

35 

44 

44 

34 

43 

Observations 

33 

41 

42 

44 

41 

41 

Totals 

184 

68         170 

378 

Means 

46 

34          34 

42 
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TABLE   7.21-Analysis  of  Variance  Using  the  Data  of  Table  7.20  To  Test 
the  Hypothesis  Hi^i  =M2  =  M3  =  ^4 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

F-Ratio 

Meatx      .    .         . 

1 

32   000 

32   000 

Among  groups  

3 

432 

144 

72 

Within  groups  

16 

32 

2 

Total 

20 

32,464 

^ 

7.21      SEVERAL 

H  :oi  =  erf  = 


NORMAL  POPULATIONS; 

•    •    •    =  erf 


In  Section  7.11  an  .P-test  was  proposed  for  testing  the  hypothesis 
Hi <r?  =  <T2  versus  A:<rf=^cr|.  At  this  time,  we  wish  to  consider  the 
situation  in  which  we  have  k  normal  populations,  &>2.  Several  test 
procedures  have  been  proposed  for  handling  this  type  of  problem,  but 
only  the  method  due  to  Bartlett  (2)  will  be  presented  in  this  book. 

As  in  Section  7.207  the  sample  observations  will  be  denoted  by 
Y a  0&=  1,  -  •  -  ,  7c;j  =  l,  •  -  -  3  nt).  Other  symbols  will  also  be  defined  as 
in  the  preceding  section  and,  in  addition,  we  will  denote  K^  — "F»-  by 
2/ij.  Thus,  in  agreement  with  an  earlier  definition, 

nt 

Using  this  notation,  the  mechanics  of  Bartlett's  procedure  are  as 
shown  in  Table  7.22.  If,  in  this  table,  x2^x*i-.«o<*--i:>>  the  hypothesis 
£r:a-f  =  cr|==  -  .  -  =  cr|  would  be  rejected.  (NOTE:  The  researcher 
will  find  it  necessary  to  compute  the  corrected  value  of  chi-square  only 
if  the  uncorrected  chi-square  falls  close  to  and  above  the  tabulated 
value,  and  then  only  if  he  wishes  to  obtain  a  very  accurate  evaluation 
of  the  exact  probability  of  Type  I  error.) 


Example  7.33 

Consider  the  data  of  Table  7.23.  Following  the  procedure  indicated 
in  Table  7.22,  the  results  presented  in  Table  7,24  are  obtained.  It  is 
seen  that  x2sa=2.81  <x^95(s)  SBB7,81J  and  thus  the  hypothesis  of  homoge 
neous  variances  may  not  be  rejected  at  the  5  per  cent  significance  level. 
(WOTE:  There  was  no  need  to  calculate  the  corrected  value  of  chi- 
square  in  this  example;  the  computations  were  carried  out  only  to  illus 
trate  the  method.) 


7.22      SAMPLE  SIZE 

A  question  frequently  asked  of  statisticians  is, 


is  needed  for  this  experiment? 


How  large  a  sample 
The  question  is  deceptively  simple, 
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TABLE  7.22-Computations  for  Bartlett's  Test  for  Homogeneity  of  Variance 


Sample 

Z2 
%' 

Degrees  of 
Freedom 

!/<*/. 

2 

^i 

logiQSf 

(d./.)  logios? 

1  

A        2 
7       'Vi,' 

'Tl\  —  —    1 

l/(n*  —  1) 

2 

c. 

loeTin^i 

(**.  —  .  1^  lnefin?i 

2  

J~l 

712 
EAf«,- 

^2   —    1 

i/cn_  _  i\ 

2 

Crt 

Ififfi  r»9<> 

f'yt.n    —  —     1  )    lOffi  a.V«» 

k      . 

J-l 

n&         „ 

Zyi 

TZjfc   —    1 

l/(nk  —  11 

2 

c  » 

IrtOt,  n  C» 

(  <vt  i     ^«.    1  )    1  /-)  nr-  n  c  t 

j-i 

Sum 

PP", 

fc 
Efo-            11 

^     i 

^         1*7*       —     I"1)   lrt<TinC* 

KK  i/i/ 

t-1 

t-l    Wt   ~   1 

i-1 

/     fc 

Pooled  estimate  of  variance  =  s*  =  TFV2/    /   E  (w*  ™" 

/      i-i 


log 


.  10  [B  -  i  <*,  -  i)  iogl05<  "] 

I—  i—  I  -J 


Correction  factor  -  C 
Corrected 


1  +  [l/3(*  -  l)]f  Z)  -  -  --  1    /  S  («<  -  1)1 

L   i-i     Wi    —    1  /         i«.i  J 

Note:  log«  10  -  2.3026 


but  the  answer  is  hard  to  find.  Before  the  statistician  can  provide  any 
thing  better  than  an  "educated  guess/'  he  must  retaliate  with  several 
questions,  the  answers  to  which  should  enable  him  to  attack  the  prob 
lem  with  some  hope  of  reaching  a  valid  answer.  Frustrating  as  this  may 
be  to  the  researcher,  it  frequently  serves  a  very  good  purpose,  for  it 
forces  the  researcher  to  give  serious  thought  to  several  aspects  of  his 

TABLE  7.23-Four  Samples  From  Normal  Populations 

JS  •<?!  ===Cr2  :==0"3  ^^  04 


1 

2 

3 

4 

48 

42 

33 

78 

49 

39 

42 

69 

67 

51 

46 

60 

75 

57 

47 

52 

53 

75 

50 

63 

33 

45 

50 

35 

138  CHAPTER    7,    STATISTICAL    INFERENCE:    TESTING    HYPOTHESES 

TABLE   7.24-Computatlons  for  Bartlett's  Test:  Data  from  Table  7.23 


Sample 

Zrf 

Degrees 
of  Free 
dom 

1/d.f. 

4 

logics? 

(dj.)  Iog105? 

1  

1113.0 

5 

.2000 

222.6 

2.34753 

11.73765 

2  

820.8 

4 

.2500 

205.2 

2.31218 

9.24872 

3  

173.2 

4 

.2500 

43.3 

1  .  63649 

6.54596 

4.  ,  .  . 

1330.0 

7 

.1428 

190.0 

2.27875 

15,95125 

Sum 

3437.0 

20 

.8428 

43.48358 

Pooled  estimate  of  variance  =  sz  =  3437/20  =  171.85 


B 


<X-  -  1)  =  (2.23515)  (20)  =  44,7030 


(2.3026)  (44.7030  -  43.48358)  =  2.80784 


Correction  factor  =  C  «  1  4-  [1/3(3)  ]  (.8428  —  1/20)  «  1.0881 


Corrected 


2.80784/1.0881  «  2.5805 


problem.  To  illustrate,  some  of  the  questions  that  might  be  asked  by 
the  statistician  are: 

(1)  What  is  your  hypothesis?  What  are  the  alternatives? 

(2)  What  are  you  trying  to  estimate? 

(3)  What  significance  level  are  you  planning  to  use?  What  confi 
dence  level? 

(4)  How  large  a  difference  do  you  wish  to  be  reasonably  certain  of 
detecting?  With  what  probability? 

(5)  What  width  confidence  interval  can  you  tolerate? 

(6)  What  do  you  expect  the  variability  of  your  data  to  be? 

When  answers  to  these  and  other  questions  are  provided  by  the  re 
searcher,  the  statistician  can  be  of  help  in  determining  the  needed 
sample  size. 

Before  you  get  the  impression  that  all  is  lost,  let  me  hasten  to  assure 
you  that  the  picture  is  not  all  black.  In  some  cases,  fairly  simple 
formulas  arc  available  for  estimating  the  required  sample  size.  Also,  if 
OC  curves  are  available  for  the  test  procedure  to  be  used,  the  reqxiired 
sample  size  may  be  determined  upon  examination  of  these  curves. 
Tables  have  also  been  provided  for  certain  procedures  and  four  of  these 
are  reproduced  in  Appendices  9  through  12  for  your  use.  If  all  of  these 
three  approaches  (that  is,  formulas,  OC  curves,  or  tables)  fail  to  meet 
your  demands,  a  professional  statistician  should  be  consulted. 

Example  7.34 

Consider  testing  the  hypothesis  //r^^Mo  versus  A  :^F^Q  at  the  5  per 
cent  significance  level.  If  <r  is  estimated  to  be  0.8  and  a  difference 
5»  (M— MO|  =1.2  is  to  be  detected  with  probability  0.9  (this  is  equiva- 
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lent  to  setting  /3  =  0.1  at  ^==^0~  1.2  and  at  M=Aio+1.2),  how  large  a 
sample  is  needed?  Setting  D  =  1.2/0.8  =  1.5  and  consulting  Appendix  9, 
it  is  found  that  n  =  7. 

Example  7.35 

Consider  testing  H  :JJL  <MO  versus  A  :ju  >/zo  at  the  1  per  cent  significance 
level.  If  or  is  estimated  to  be  1.2  and  5=ju  —  ju0—  0.9  is  to  be  detected 
with  probability  0.95,  how  large  a  sample  is  needed?  Setting 

D  =  0,9/1.2  =  0.75 
and  consulting  Appendix  9,  it  is  found  that  n  =  3l. 

Example  7.36 

Consider  testing  H  ://i  <pc2  versus  A  :^i  >Ma  at  the  2§  per  cent  signifi 
cance  level.  If  <r  is  estimated  to  be  1.0  and  5=/z3  —  MS  —  1.6  is  to  be  de 
tected  with  probability  0.99,  how  large  should  the  two  samples  be?  Set 
ting  D  =  1.6/1.0  =  1.6  and  consulting  Appendix  10,  it  is  found  that 
n1==n2  =  16. 

Example  7.37 

Consider  testing  Hip.  3=/x  2  versus  A  I/JLI  7*1*2  at  the  1  per  cent  significance 
level.  If  cr  is  estimated  to  be  1.5  and  <5  =  [MI—  Ma|  =1.8  is  to  be  detected 
with  probability  0.95,,  how  large  should  the  two  samples  be?  Set 
ting  Z>  =  1.8/1.5  —  1.2  and  consulting  Appendix  10,  it  is  found  that 
ni  =  ™2  =  27. 

Example  7.38 

Consider  testing  H:CTZ<O-Q  versus  A'cr*>o$  at  the  5  per  cent  sig 
nificance  level.  If  a  value  of  cr2  =  4oo  is  to  be  detected  with  probability 
0.99,  how  large  a  sample  is  needed?  Using  R=4  and  consulting  Ap 
pendix  11,  it  is  seen  that  15  <v  <20.  Crude  interpolation  suggests 
*>  =  19  or  n  = 


Example  7.39 

Consider  testing  H:O-*>CTQ  versus  A:o*<o%  at  the  5  per  cent  signifi 
cance  level.  If  a  value  of  cr2  —  0.33  o%  is  to  be  detected  with  probability 
0.99,  how  large  a  sample  is  needed?  Since  Appendix  11  is  constructed 
for  values  of  R>1,  a  slight  change  in  procedure  (from  Example  7.38)  is 
required.  The  table  in  Appendix  11  is  entered  with  of  ==/?==  0.01, 
^'=*  o:  =  0.  05,  and  R'  =  1/R  =  3.  Thus,  it  is  noted  that  24  o  <30.  Crude 
interpolation  suggests  ^  =  26  or  n==^+l=27.  (NOTE:  Although  the 
roles  of  a.  and  /?  were  interchanged  when  Appendix  11  was  consulted, 
the  actual  test  would  be  carried  out  at  the  original  value  of  ex.  which,  in 
this  example,  was  0.05.) 

Example  7.40 

Consider  testing  H:<ri>a%  versus  ^L:<jf<cr|  at  the  .5  per  cent  sig 
nificance  level.  If  a  value  of  (r!  =  4crf  is  to  be  detected  'with  probability 
0.99,  how  large  should  the  two  samples  be?  Using  J?  =  4  and  consulting 


i4O  CHAPTER  7,  STATISTICAL  INFERENCE:  TESTING  HYPOTHESES 

Appendix  12,  it  is  noted  that  30  Oj,=*>2  <40.  Crude  interpolation  sug 
gests  that  i>i  =  if  2  ==  34  or  n\  =  n2  =  35. 

7-23      SEQUENTIAL  TESTS 

In  all  the  test  procedures  described  thus  far,  the  sample  size  has  been 
decided  upon  in  advance.  As  has  been  inferred,  the  determination  of 
the  proper  sample  size  is  often  difficult .  However,  given  the  necessary 
information  (e.g.,  an  estimate  of  the  variability  to  be  encountered  and 
statements  concerning  the  allowable  risks  associated  with  incorrect 
decisions) ,  the  required  sample  size  may  be  specified  (see  Section  7.22). 
The  reader  should  realize,  though,  that  there  is  a  certain  "cost77  at 
tached  to  such  an  approach.  That  is,  there  is  an  implicit  assumption 
in  the  fixed  (predetermined)  sample  size  approach  that  a  sample  of  the 
specified  size  will  be  taken,  and  observations  recorded  for  each  sample 
unit,  regardless  of  whether  all  the  observations  are  needed  to  reach  a 
decision.  In  view  of  this  and  in  the  hope  of  achieving  economies  due  to 
reduced  sample  sizes,  it  seems  desirable  to  seek  a  test  procedure  in 
which  the  sampling  may  be  terminated  as  soon  as  it  is  possible  to  reach 
a  decision  to  either  accept  or  reject  the  hypothesis  under  test.  For  cer 
tain  specific  cases,  namely,  those  which  involve  a  simple  hypothesis 
£T:0  =  0o  and  a  single  alternative  A  :0  =  01?  such  a  test  has  been  devised. 
It  is  known  as  the  sequential  probability  ratio  test.  In  the  remainder  of 
this  section,  the  general  nature  of  this  procedure  will  be  described  and 
certain  specific  applications  illustrated. 

The  sequential  method  of  testing  proceeds  as  follows:  Sample  units 
are  randomly  selected  one  at  a  time  (i.e.,  sequentially)  and,  after  each 
observation  is  obtained,  one  of  the  following  decisions  is  made: 

(1)  Accept  J/:0  =  #o  (i.e.,  reject  A:d^&i). 

(2)  Reject  H:Q  =  9v  (i.e.,  accept  A:0^di). 

(3)  Obtain  an  additional  observation. 

To  determine  which  of  these  three  decisions  is  appropriate,  the  ana 
lyst  should  calculate 

-R.-II—4  (7-46) 

*-i    /o(#*) 

where  /o(#)  is  the  probability  function  (or  probability  density  func 
tion)  under  the  assumption  that  H:Q^QQiB  true  and/i(x)  is  the  proba 
bility  function  (or  probability  density  function)  under  the  assumption 
that  A  :&***&i  is  true.  Then,  depending  on  the  value  of  /?*,  one  of  the 
three  decisions  previously  listed  is  reached  by  proceeding  according  to 
the  following  rule: 

(1)  If  Rn<&(l~-c*)y  accept  H  (i.e.,  reject  A). 

(2)  If  jB*>(l—0)/«,  reject  H  (i.e.,  accept  A). 

(3)  If  £/(!  —  «)  <Bn  <  (1  ~-*j8)/a,  obtain  an  additional  observation. 
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In  the  above,  a.  and  /5  are,  respectively,  the  preassigned  risks  of: 
(1)  rejecting  H  when  H  is  true  and  (2)  accepting  H  when  A  is  true.  If 
an  and  rn  are  used  to  denote  the  acceptance  and  rejection  values,  re 
spectively,  for  a  test  statistic,  the  decision  rule  may  be  restated  in  the 
following  form: 

(1)  If  the  value  of  the  test  statistic  is  less  than  or  equal  to  an, 
accept  H  (i.e,,  reject  ^4). 

(2)  If  the  value  of  the  test  statistic  is  greater  than  or  equal  to  rnt 
reject  H  (i.e.,  accept  A}. 

(3)  If  the  value  of  the  test  statistic  is  greater  than  an  and  less 
than  rn,  continue  sampling. 

It  should  be  clear,  of  course,  that  the  sample  size  is  a  variable  in  a 
sequential  procedure  as  contrasted  to  its  role  as  a  (predetermined) 
constant  in  the  classical  test  procedures.  Thus,  in  addition  to  examin 
ing  the  power  of  a  sequential  test  procedure  by  studying  its  OC  func 
tion,  it  is  appropriate  that  its  "cost"  be  assessed  by  considering  the 
average  size  of  sample  required  to  reach  the  decision  to  accept  or  to 
reject.  This  analysis  is  usually  made  in  terms  of  the  ASN  function, 
where  the  letters  ASN  stand  for  average  sample  number.  Rather  than 
go  into  details  concerning  the  ASN  function  and  the  savings  due  to 
reduced  sample  sizes,  let  us  be  content  with  the  general  statement  that 
the  potential  savings  are  considerable,  in  some  cases  as  much  as  50 
per  cent. 

Considerable  space  could  be  devoted  to  a  detailed  discussion  of  the 
sequential  probability  ratio  test  for  each  of  the  commonly  encoun 
tered  situations.  However,  it  is  doubtful  if  such  discussions  would  serve 
any  useful  purpose.  Accordingly,  the  tests  have  been  specified  in  Table 
7.25. 

Example  7.41 

Consider  a  binomial  popxilation  and  the  hypothesis  JEJ:p  =  0.10 
versus  the  alternative  A:p~0.20.  Let  <x  ==0.01  and  /?==  0,05.  Then  log 
[/3/(l  — «)]  =  —2.986  and  log  [(1  ~/3)/a]  =4.554.  If  we  represent  a 
sample  unit  possessing  the  characteristic  associated  with  p  by  the 
symbol  d  and  a  unit  not  possessing  this  characteristic  by  g  (e.g.,  defec 
tive  and  nondefective  units,  respectively),  then  the  sequence 

gggdgdggdgggddgdgddgd 
would  terminate  at  this  point  with  the  decision  to  reject  H  and  accept  A. 

Example  7.42 

Consider  a  normal  population  with  known  standard  deviation,  cr  =  10. 
Test  the  hypothesis  £f:/x  =  50  versus  the  alternative  A:jjL  =  7Q.  Let 
a  ===0,01  and /?==  0.01.  Then  log  [/?/(!— a)  ]  = —4.595  and  log  [(1— /8)/a] 
—  4.595.  If  sequential  sampling  yielded,  in  the  order  shown,  the  fol 
lowing  values  of  X  (60,  75,  65,  70),  the  sampling  would  terminate  at  this 
stage  with  the  decision  to  reject  H  and  accept  A. 
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Problems 

7.1  A  company  engaged  in  the  casting  of  pig  iron  must  be  concerned  with 
the  per  cent  of  silicon  in  the  pig  iron.  The  data  given  below  constitute 
a  random  sample  of  the  production  records.  Using  a.  ==0.02  and  assum 
ing  normality,  test  the  hypothesis  that  the  process  average  is  0.85 
grams  of  silicon  per  100  grams  of  pig  iron. 

NUMBER  OF  GRAMS  OF  SILICON  INT 
100-GRAM  SAMPLES  OF  PIG  IRON 


1.13 

0.87 

0.80 

0.92 

0.85 

0.81 

0.60 

0.97 

0.97 

0.48 

0.92 

1.00 

0.94 

0.92 

0.72 

0.61 

1.17 

0.81 

0.87 

0.71 

0.36 

0.97 

0.68 

0.89 

0.73 

1.16 

0.82 

0.68 

0.79 

1.00 

7.2  Consider  these  observations  to  represent  the  average  hourly  earnings 
during  May,  1940,  of  a  random  selection  of  50  male  workers  in  a  speci 
fied  industry. 

EARNINGS 
(in  Cents  per  Hour) 


35 

65 

68 

77 

81 

52 

82 

74 

73 

71 

68 

79 

73 

70 

67 

82 

61 

77 

84 

56 

29 

53 

61 

83 

92 

99 

80 

62 

50 

64 

76 

47 

59 

64 

72 

55 

63 

107 

48 

70 

55 

70 

43 

66 

85 

79 

90 

39 

88 

86 

(a)    What  is  your  best  estimate  of  the  average  hourly  earnings  for  all 

male  workers  in  the  industry? 
(6)    How  good  is  your  estimate  in  (a)   above?  What  is  its  standard 

error? 
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(c)  Establish  confidence  limits  for  your  estimate  in  (a)  above.  Write 
out  your  statement  about  these  confidence  limits  in  words.  State 
your  assumptions  clearly. 

(d)  Is  your  estimate  in  (a)  above  in  agreement  with  the  hypothesized 
true  value   of   68   cents   per  hour  for  average  earnings  in  May, 
1940?  Explain  your  answer. 

(e)  What  additional  data  would  you  need  to  estimate  the  total  earn 
ings  in  the  industry  for  the  month  of  May? 

(/)    Test  the  hypothesis  that  ^  <80. 

7.3  Test  the  hypothesis  that  the  mean  life  (in  years)  of  wooden  telephone 
poles  is  less  than  8  years.  State  any  assumptions  you  make  about  the 
following  data: 


LENGTH  OF  LIFE  OF  1000  WOODEN 
TELEPHONE  POLES 


Life 
(in  years) 

Number  of  Poles 
Replaced 

.5  but  under     1  ,5 

4 

1.5  but  under     2.5 

7 

2.5  but  under     3.5 

15 

3  .5  but  under     4.5 

32 

4.5  but  under     5.5 

30 

5.5  but  under     6.5 

57 

6.5  but  under     7.5 

61 

7.5  but  under     8.5 

73 

8.5  but  under    9.5 

96 

9.5  but  under  10.5 

104 

10  .  5  but  under  11,5 

103 

11.5  but  under  12,5 

95 

12.5  but  under  13.5 

91 

13.5  but  under  14.5 

73 

14.5  but  under  15.5 

64 

15.5  but  under  16.5 

38 

16.5  but  under  17.5 

30 

17.5  but  under  18.5 

18 

18.5  but  under  19.5 

5 

19.5  but  under  20.5 

1 

20.5  but  under  21.5 

1 

21  .5  but  under  22.5 

2 

Total    1000 

7.4  A  consumer  panel  report  on  the  economic  and  geographic  distribution 
of  the  purchases  of  a  particular  product  reveals  among  other  things 
that  the  nation's  families  bought,  on  the  average,  17.5  Vbs.  of  that 
product  in  1949,  This  estimate  was  based  on  returns  from  a  supposed 
random  sample  of  122*5  families,  and  the  standard  deviation  of  indi 
vidual  family  purchases  in  this  sample  was  found  to  be  7.5  lb&»  From 
sales  and  inventory  records,  it  is  determined  that  average  purchases 
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per  family  in  1948  must  have  been  at  least  18.5  lbs.;  or  1  pound  more 
than  the  sample  estimate  for  1949.  Could  this  difference  of  1  pound 
be  due  to  sampling  variation,  or  does  it  indicate  that  average  con 
sumption  of  the  product  by  families  had  decreased  in  1949  from  the 
1948  level  of  consumption?  What  assumptions  did  you  make? 

7.5  Using  the  data  in  Problem  6.1,  test  the  hypothesis  #:/*  =  6.55X10-27 
versus  A:  M  5^  6. 55X1 0-27.  Let  a>  =  0.01. 

7.6  Using  the  data  in  Problem  6.5  and  letting  <x  =  0.025,  test  H:fj,<l.I2 
inches  versus  A  ip,  >1.12  inches. 

7.7  Using  the  data  of  Problem  6.6  and  letting  a  =  0.05,  test  H:jji>1.55 
versus  A  :&  <1.55. 

7.8  Using  the  data  of  Problem  6.7  and  letting  «  =  0.25,  test  H:fj,<29QQ 
yards  versus  A  :/JL  >2900  yards. 

7.9  Using  the  data  of  Problem  6.1  and  letting  a:  =  0.01,  test  H:cr<0.0l 
X10-27  versus  A  :cr>0.01  X10~27. 

7.10  Using  the  data  of  Problem  6.5  and  letting  a:  =  0.10,  test  H  :cr*  <0.0001 
versus  A  :or2  >0.0001. 

7.11  Using  the  data  of  Problem  6.6  and  letting  ot.  =  0.005,  test  #:cr>0.05 
versus  A:cr<0.05. 

7.12  Using  the   data  of  Problem   6,7   and  letting  <x  =  0.01,   test  £T:o-  =  50 
yards  versus  A:cr^50  yards. 

7.13  In  making  a  certain  cross.,  a  geneticist  expected  a  segregation  of  15 
A's  to  1  B.  In  a  random  sample  of  800  he  observed  730  A's  and  70 
B's.  Do  the  data  support  the  expected  ratio?  Why? 

7.14  In  a  random,  sample  of  400  farm  operators,  65  per  cent  were  owners 
and  35  per  cent  were  nonowners.  Test  the  hypothesis  that  in  the  pop 
ulation  of  farm  operators  60  per  cent  are  owners.  Use  a  probability  of 
Type  I  error  equal  to  ,05. 

7.15  A  manufacturer  of  light  bulbs  claims  that  on  the  average  1  per  cent 
or  less  of  all  the  light  bulbs  manufactured  by  his  firm  are  defective. 
A  random  sample  of  400  light  bulbs  contained  12  defectives.  On  the 
evidence  of  this  sample,   do  you  believe  the   manufacturer's  claim? 
Why?  Assume  that  the  maximum  risk  you  wish  to  run  of  falsely  reject 
ing  the  manufacturer's  claim — the  true  fraction  defective  is  .01 — has 
been  set  at  2  per  cent. 

7.16  A  sampler  of  public  opinion  asked  400  randomly  chosen  persons  from 
some  specified  population  whether  they  favored  candidate  A  or  B] 
220  voted  for  A  and  180  for  B.  Using  a  probability  of  Type  I  error 
equal  to  .05,  do  you  think  that  opinion  in  the  population  may  have 
been  equally  divided?  Why? 

7.17  A  supermarket  is  to  be  built  in  a  new  location.  The  question  arose  as 
to  whether  provision  should  be  made  for  individual  customer  service 
at  the  meat  counter,  or  whether  a  self-service  counter  with  all  meats 
ready-cut  and  packaged  would  adequately  serve  customers  in  the  new 
area.  The  management  decision  was  that  individual  customer  service 
would  not  be  supplied  unless  40  per  cent  of  the  prospective  customers 
desired  such  service.  A  random  sample  of  160  prospective  customers 
showed  only  50  respondents  desiring  individual  service.  Does  it  appear 
that  the  proportion  of  preference  in  the  population   of  prospective 
customers  equals  or  exceeds  the  critical  level  set  by  management? 

7.18  In  a  triangular  test  for  selecting  judges  to  compose  a  taste  panel,  a 
prospective  judge  was  successful  in  selecting  the  odd  sample  11  times 
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in  15  trials.  Would  you  select  him  for  the  panel?  How  many  would  he 
have  to  pick  correctly  to  be  chosen?  What  is  the  probability  of  Type  I 
error  if  we  accept  the  above  judge  for  our  panel?  Construct  the  com 
plete  table  of  probabilities,  showing  them  also  in  cumulative  form,  for 
n-^15. 

7.19  Retail  sales  data  indicate  that  |  of  the  families  in  the  WOI-TV  area 
have  television  sets.  A  random  sample  of  900  families  from  the  area  is 
to  be  taken,  (a)  What  is  the  expected  number  of  television  families  for 
the  sample?  (b)  The  sample  yields  360  families  with  television  sets. 
Indicate  at  least  two  methods  by  which  we  may  obtain  approximate 
confidence  limits  for  the  population  proportion  of  families  owning  tele 
vision  sets,  (c)  Is  the  observed  number,  360,  in  "reasonable"  agree 
ment  with  the  expected  number? 

7.20  Eighty  out  of  1000  randomly  chosen  cases  of  diphtheria  resulted  in 
death.    What   methods   or   techniques   are   available   for   using   these 
results  to  tost  the  hypothesis  that  the  true  percentage  of  fatality  is 
10  per  cent?  State  whether  the  tests  are  exact  or  approximate. 

7.21  A  botanist  observed  350  seedlings  for  the  purpose  of  studying  chloro 
phyll  inheritance  in  corn.  The  seed  came  from  self-fertilized  hetero 
zygous  green  plants.  Hence,  green  and  yellow  seedlings  were  expected 
in  proportions  of  3  green  to  1  yellow.  The  sample  showed  120  green 
and  30  yellow  seedlings.  Is  this  sample  in  agreement  with  expectation? 

7.22  A  metropolitan  newspaper  was  considering  a  change  to  tabloid  form. 
A  random  sample  of  900  of  its  daily  readers  was  polled  to  secure 
readership  reaction  to  such  a  change.   Of  this  sample,   541    persons 
opposed  the  change  in  format  for  the  paper,  (a)  Is  it  likely  that  more 
than  50  per  cent  of  the  readers  are  in  favor  of  the  change?  (b)  Describe 
two  or  more  procedures  for  obtaining  confidence  limits  for  the  popu 
lation  proportion  opposed  to  the  change. 

7.23  From  a  keg  containing  1000  bolts,  a  random  sample  of  20  bolts  has 
been  presented  to  you  for  testing.  One  hundred  per  cent  of  the  bolts 
in  the  sample  successfully  pass  the  test.  Of  all  the  bolts  in  the  keg, 
what  is  your  estimate  of  the  percentage  that  will  pass?  What  limits 
would  you  place  on  the  reliability  of  your  estimate;  that  is,  what  con 
fidence*,  statement  would  you  make  about  the  true  percentage  of  all 
the  bolts  that  will  pass  the  test? 

7.24  After  a  survey  of  opinion  is  made,  point  and  interval  estimates  are  cal 
culated.  The  investigator  states  that  the  95  per  cent  confidence  inter 
val  is  from  (>0  per  cent  to  75  per  cent  of  the  population  in  favor  of  a 
law.  Describe  precisely  the  moaning  of  this  statement. 

7.25  IMng  the  data  of  Problem  «.20  ami  letting  a  «=  0.005,  test  ff:^^<f^i 
versus  A  :/ia  >Mx- 

7*2Ci  lining  the  data  of  Problem  6.2)1  and  letting  a  »  0.05,  tost  //r^i— Ms 
versus  ^1  in\  *£&%. 

7.27  UniTig  the  data  of  Problem  0,22  and  letting  a«0.10,  test  //r/xi^Ma 
vorfcms  A  :/xi  y^Ma- 

7.2cS  Wo  are  told  that  the  moan  yields  of  two  corn  hybrids  wore  75  and  cS5 
buaholB  per  acre,  respectively,  and  that  each  had  boon  tritul  in  10 
fields  Holoctod  at  random  from  ftomo  population  of  fiolcln.  Further, 
uHHmning  that  cr'f  —  cri,  wo  are  told  that  the  ntamlard  error  of  each  of 
the*  above  means  wan  3.  Tost  the  hypothesis  that /*i -"/AS- 
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7.29  The  diameter  of  a  cylinder  was  measured  by  16  persons.  Each  person 
made  three  determinations  using  a  micrometer  caliper  and  three 
determinations  using  a  vernier  caliper.  Following  are  the  averages  of 
the  three  determinations  (in  inches) ,  for  each  caliper,  made  by  the 
16  persons. 


Micrometer 

Vernier 

Micrometer 

Vernier 

Micrometer 

Vernier 

1.265 

1.265 

1.270 

1.269 

1.264 

1.267 

1.265 

1,267 

1.267 

1.273 

1.266 

1.272 

1.267 

1.267 

1.268 

1.270 

1.266 

1.273 

1.266 

1.266 

1.267 

1.270 

1.268 

1.267 

1.268 

1.267 

1.267 

1.267 

1.265 

1.268 

1.265 

1.267 

7.30 


Is  there  any  difference  between  the  means  of  the  populations  of  meas 
urements  represented  by  the  two  samples?  The  method  to  be  used  is 
determined  by  the  fact  that  each  person  used  both  calipers.  Do  you 
think  the  difference  is  attributable  to  imperfections  of  the  calipers  or 
to  the  difficulty  of  setting  the  vernier  caliper? 

The  following  are  the  lengths  in  millimeters  of  6-year-old  white  crap- 
pies  from  East  Lake,  Lucas  County,  Iowa,  in  1948.  Measurements 
were  made  by  William  Lewis  and  T.  S.  English. 


Males 


Females 


228 

217 

219 

231 

225 

219 

230 

'   217 

222 

214 

224 

220 

225 

220 

221 

225 

221 

228 

222 

233 

239 

225 

234 

222 

227 

223 

223 

222 

223 

234 

241 

223 

225 

253 

220 

233 

213 

224 

235 

281 

224 

212 

218 

235 

231 

231 

220 

224 

264 

251 

321 

223 

246 

247 

214 

241 

272 

Is  there  any  difference  between  the  lengths  of  male  and  female  crap- 
pies  of  this  age  group  in  East  Lake  in  1948? 

7.31  In  order  to  test  two  methods  of  teaching  spelling,  40  pupils  were  ran 
domly  assigned  to  two  classes  and  one  method  was  tried  on  each  class. 
At  the  end  of  the  trial  a  test  was  given.  Following  are  the  scores  on  the 
tests : 
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Method 

A 

Method 

B 

10 

48 

20 

57 

20 

50 

27 

60 

25 

51 

35 

63 

30 

52 

40 

64 

33 

54 

41 

65 

37 

56 

50 

67 

41 

57 

50 

67 

43 

65 

54 

73 

46 

73 

56 

83 

46 

86 

57 

95 

Test  the  hypothesis  that  the  two  methods  of  teaching  spelling  are 
equally  effective.  State  all  your  assumptions. 

7.32  Using  the  data  of  Problem  6.23  and  letting  <x  =  0.01,  test  H:fj,D^Q 
versus  A  :&&  ?^0. 

7.33  A  certain  stimulus  administered  to  each  of  9  patients  resulted  in  the 
following  increases  in  blood  pressure:  5,  1,  8,  0,  3,  3,  5,  — 2,  4  mm.  Hg. 
Can  it  be  concluded  that  the  stimulus  will  be  in  general  accompanied 
by  an  increase  in  blood  pressure? 

7.34  Suppose  an  investigator  of  group  differences  in  I.Q.  finds,  for  inde 
pendent  random  groups  A  and  B  of  11  subjects  each  assumed  to  be 
from  normal  populations  of  same  variance,   a  difference  in  sample 
means  of 

TA  -  7*  «  3.9  I.Q.  points 

and  an  estimated  standard  error  of  the  mean  difference  of  2.0,  He 

selects  lOOcx  as  5  per  cent, 

(a)   For  the  data  as  given,  what  hypothesis  might  he  test?  Perform 

the  required  test  and  state  your  conclusions. 
(6)    Suppose  group  A  had  been  given  special  coaching  designed  to 

"increase"  I.Q,,  while  group  B  had  been  maintained  as  a  con- 

troL  What  hypothesis  might  he  test?  Perform  the  required  test 

and  give  the  resulting  inferences. 

7.35  In  examining  the  resistance  to  crushing  offered  by  kernels  of  a  single 
ear  of  corn,  we  choose  at  random  two  lots  of  10  kernels  each  with  the 
following  results: 

CRUSHING  RESISTANCE 
(in  points) 


Lot  I 

Lot  II 

8 

18 

8 

20 

14 

20 

15 

20 

16 

22 

16 

24 

17 

27 

18 

28 

18 

30 

20 

31 
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Using  the  method  of  paired  observations,  we  find  the  difference  be 
tween  the  two  means  to  be  significant.  We  draw  four  more  sets  of  two 
samples,  each  time  with  a  significant  difference.  This  seems  surpris 
ing,  since  all  the  samples  were  taken  from  the  kernels  of  the  same  ear. 
Can  you  explain  the  results? 

7.36  (a)  For  the  data  given  below,  test  the  hypothesis  that  the  true  mean 
tensile  strength  of  the  product  of  the  C  and  C  Manufacturing 
Company  is  greater  than  the  corresponding  value  for  its  competi 
tor.  State  all  your  assumptions. 

(6)  Ignoring  any  assumption  about  variances  you  may  have  found  it 
necessary  to  make  in  (a)  above,  test  the  hypothesis  that  the  two 
population  variances  are  equal. 


TENSILE    STRENGTH    or    SCREW 
DRIVER  OP  34  VALVE  CAPS 
PRODUCED  BY  THE  C  AND  C 
MANUFACTURING  COMPANY 


Test 

Tensile 
Strength 
in 
Pounds 
Y 

Test 

Tensile 
Strength 
in 
Pounds 
Y 

1  

130.1 

19  

153.5 

2  

132.3 

20  

154.1 

3  

133.4 

21  

154.7 

4.  ... 

135.5 

22  

155.4 

5  

137.7 

23.  .  .  . 

156.7 

6  

139.3 

24  

157.5 

7  

140.4 

25  

158.4 

8  

144.2 

26  

159.4 

9  

145.0 

27.  .  .  . 

160.7 

10  

146.7 

28  

161.9 

11  

147.4 

29  

163.1 

12  

148.3 

30  

164.8 

13.  .  .  . 

149.7 

31  

169.3 

14.  ... 

150.6 

32  

171.2 

15.  .  .  . 

151.1 

33  

174.0 

16  

151.8 

34  

180.7 

1  *7 

1   CO      1 

1  /  .    .   .    . 

18  

I  3  £  .  X 

152.7 

Total 

5183.7 

TENSILE    STRENGTH    OF    SCREW 
DRIVER  OF  36  VALVE  CAPS 
PRODUCED  BY  A  COMPETITOR 
OF  THE  C  AND  C  MANUFAC 
TURING  COMPANY 


Test 

Tensile 
Strength 
in 
Pounds 
Y 

Test 

Tensile 

Strength 
in 
Pounds 
Y 

1.  ... 

65.7 

20  

149.4 

2.  ... 

101.3 

21  

151.0 

3  

103.0 

22  

153.3 

4  

103.6 

23.  ... 

155.2 

5  

107.2 

24  

157.6 

6  

115.9 

25  

160.7 

7  

117.4 

26  

164.3 

8.  ... 

122.6 

27.  ... 

166.1 

9  

126.5 

28  

168.8 

10  

129.1 

29.  ... 

170.4 

11  

132.3 

30.  ... 

180.6 

12.  ... 

134.6 

31.  ... 

184.6 

13  

135.2 

32  

188.8 

14.  .  .  . 

136.7 

33  

192.9 

15  

138.3 

34  

196.0 

16.  .  .  . 

142.1 

35  

200.4 

17  

143.4 

36  

204.8 

18  

147.2 

19.  .  .  . 

148.2 

Total 

5295.2 

7.37  Using  the  data  which  follow,  test  the  hypothesis  that  the  true  mean 
crushing  strengths  of  air-dried  and  green  Douglas  fir  wood  are  the  same. 
State  all  your  assumptions  and  interpret  your  results. 
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CRUSHING  STRENGTHS  OF  248  SAMPLES  OF  AIR-DRIED  DOUGLAS  FIR,  SIZE 
2"  BY  2"  BY  8".  TESTED  BY  FOREST  PRODUCTS  LABORATORY  DOMINION 
GOVERNMENT,  AT  UNIVERSITY  OF  BRITISH  COLUMBIA,  1945 


N  1 

4713 

N  1 

5641 

N  9 

7145 

E  7 

6508 

S  5 

8413 

W  3 

7446 

2 

5516 

2 

5550 

E  3 

6200 

8 

6828 

6 

7690 

5 

7941 

3 

5956 

3 

7433 

4 

7501 

9 

6098 

7 

8484 

6 

8159 

4 

5652 

4 

7097 

5 

8086 

10 

6359 

8 

8139 

7 

9316 

5 

5951 

5 

7865 

6 

8055 

S  1 

5208 

9 

7595 

8 

9515 

6 

7178 

6 

8045 

7 

8042 

2 

4648 

10 

7021 

9 

8171 

7 

6630 

7 

7408 

8 

8678 

3 

7153 

11 

6416 

10 

9001 

8 

6284 

8 

7344 

9 

6710 

4 

6504 

W  3 

6657 

N  3 

8161 

9 

6246 

9 

7518 

10 

7512 

5 

6562 

5 

8264 

4 

7820 

10 

4689 

10 

7280 

S  1 

6438 

6 

7105 

7 

7268 

6 

8560 

11 

4825 

E  4 

7174 

3 

6074 

7 

7114 

8 

8101 

7 

8222 

12 

4697 

5 

7234 

4 

7170 

8 

6263 

9 

7066 

8 

8387 

E  3 

5757 

6 

8452 

5 

7306 

W  3 

5530 

10 

7301 

9 

7500 

4 

6661 

7 

8709 

6 

7760 

4 

6632 

N  1 

5961 

10 

2181 

5 

6098 

8 

7710 

7 

7049 

5 

6429 

2 

6254 

11 

7655 

6 

5867 

9 

7609 

8 

6863 

6 

6912 

3 

7247 

E  3 

7373 

7 

5573 

10 

6731 

9 

6987 

7 

7053 

4 

7480 

4 

7949 

8 

6282 

S  3 

6342 

10 

6511 

8 

6370 

5 

8512 

5 

8199 

9 

5536 

4 

6924 

W  3 

7025 

9 

7413 

6 

8911 

6 

8547 

10 

4941 

5 

7712 

4 

6775 

10 

6335 

7 

8988 

7 

8464 

S  2 

4003 

6 

6805 

5 

7754 

N  1 

6584 

8 

9330 

8 

8594 

3 

4789 

7 

7539 

6 

7495 

3 

7518 

9 

9899 

9 

7092 

4 

4889 

8 

7630 

7 

7990 

4 

7106 

10 

9025 

10 

7433 

5 

5304 

9 

7501 

8 

6149 

5 

7135 

11 

8920 

S  1 

6444 

6 

5350 

10 

7531 

9 

6774 

6 

7596 

E  3 

6419 

2 

6545 

7 

5601 

11 

6096 

10 

7137 

7 

7573 

4 

8403 

3 

7320 

8 

5932 

12 

6983 

N  1 

4858 

8 

7521 

5 

8220 

4 

7886 

9 

5245 

W  3 

6212 

3 

6148 

9 

7261 

6 

9501 

5 

8173 

10 

5585 

4 

6530 

4 

5388 

10 

6364 

7 

9250 

6 

7844 

11 

4313 

5 

7800 

5 

5883 

11 

6905 

8 

9479 

7 

7613 

12 

4924 

6 

7713 

6 

5930 

E  3 

7608 

9 

9985 

8 

8469 

W  3 

5196 

7 

7759 

7 

6252 

4 

6793 

10 

9686 

9 

7675 

4 

4810 

8 

7253 

8 

5920 

5 

7734 

11 

8849 

10 

7371 

5 

6641 

9 

6898 

9 

6260 

6 

6465 

S  3 

6693 

W  3 

7113 

6 

4625 

10 

7403 

10 

6403 

7 

7499 

4 

6338 

4 

7283 

7 

6704 

N  2 

6144 

11 

6644 

8 

7703 

5 

5976 

5 

8337 

8 

5555 

3 

6717 

12 

5841 

9 

7470 

7 

8495 

6 

8509 

9 

6813 

4 

7021 

E  3 

6650 

10 

7178 

8 

9184 

7 

7510 

10 

6061 

5 

8096 

4 

5802 

S  1 

6201 

9 

9485 

8 

8361 

11 

4959 

6 

7608 

S 

7287 

3 

7878 

11 

8507 

9 

7485 

12 

5618 

7 

8025 

6 

6379 

4 

7155 

12 

8270 

10 

8522 

14 

3958 

8 

8115 
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CRUSHING  STRENGTHS  OF  248  SAMPLES  OF  GREEN  DOUGLAS  FIR,  SIZE  2"  BY 

2"  BY  8".  TESTED  BY  FOREST  PRODUCTS  LABORATORY  DOMINION 

GOVERNMENT,  AT  UNIVERSITY  OF  BRITISH  COLUMBIA,  1945 


N  1 

2428 

W13 

2343 

N  7 

3639 

E  3 

3446 

S  3 

3412 

S  7 

4088 

2 

2173 

N  2 

2603 

8 

3645 

4 

2892 

4 

3904 

8 

4377 

3 

2896 

3 

2911 

9 

3487 

5 

3629 

5 

4030 

9 

4267 

4 

2980 

4 

3158 

10 

3351 

6 

3442 

6 

4212 

10 

4256 

5 

3378 

5 

3553 

E  3 

3591 

7 

3412 

7 

4423 

11 

4109 

6 

3167 

6 

3659 

4 

2849 

8 

3477 

8 

4575 

12 

3325 

7 

3208 

7 

3800 

5 

3911 

9 

3474 

9 

4318 

W  3 

3297 

8 

3342 

8 

3645 

6 

2591 

10 

3007 

10 

3829 

4 

3606 

9 

2982 

9 

3505 

7 

2769 

S  1 

2493 

11 

3933 

5 

3534 

10 

3301 

10 

3834 

8 

4097 

2 

2505 

12 

4608 

6 

4159 

11 

2330 

E  3 

2506 

9 

3203 

3 

3449 

W  4 

3340 

7 

4393 

12 

2651 

4 

2818 

10 

3179 

4 

3224 

5 

3887 

8 

3992 

E  3 

2478 

5 

3775 

S  1 

2668 

5 

3485 

6 

4097 

9 

4049 

4 

2665 

6 

3318 

2 

2766 

6 

3667 

7 

3440 

N  1 

2813 

5 

3033 

7 

3686 

3 

3280 

7 

3343 

8 

4503 

2 

2574 

6 

3205 

8 

3705 

4 

3295 

8 

3431 

9 

3806 

3 

3286 

7 

3282 

9 

3543 

5 

3844 

W  3 

2643 

10 

3939 

4 

3310 

8 

3229 

10 

3848 

6 

4022 

4 

3039 

N  1 

2902 

5 

3610 

9 

3137 

S  3 

2778 

7 

3575 

5 

3510 

2 

2869 

6 

3637 

10 

2693 

4 

2743 

8 

3784 

6 

3469 

3 

3610 

7 

3871 

S  1 

2128 

5 

3541 

9 

3621 

7 

3635 

4 

3547 

8 

3757 

2 

2200 

6 

3580 

11 

3698 

8 

4016 

5 

4012 

9 

3716 

3 

1977 

7 

3803 

W  3 

3032 

9 

3777 

6 

3919 

E  3 

3105 

4 

2498 

8 

3787 

4 

3132 

10 

3642 

7 

4585 

4 

3172 

5 

2732 

9 

3623 

5 

3781 

N  3 

3257 

8 

4553 

6 

3679 

6 

2920 

10 

3848 

6 

4141 

4 

3426 

9 

4235 

7 

3854 

7 

3102 

11 

3530 

7 

3730 

5 

4001 

10 

4495 

8 

3670 

8 

3050 

12 

3296 

8 

4162 

6 

3993 

11 

3694 

9 

3386 

9 

3230 

W  3 

2845 

9 

3559 

7 

4201 

12 

3492 

10 

3368 

10 

3053 

4 

3015 

10 

3532 

8 

4555 

E  3 

3173 

S  1 

2688 

11 

2993 

5 

3384 

N  1 

2296 

9 

3914 

4 

3879 

3 

3089 

12 

2518 

6 

3671 

2 

2458 

10 

3931 

5 

3751 

4 

3212 

W  3 

2938 

7 

3794 

3 

2794 

E  3 

3769 

6 

4197 

5 

3618 

4 

2272 

8 

3863 

4 

3075 

4 

3622 

7 

4110 

7 

3551 

5 

3144 

9 

3712 

5 

3166 

5 

4168 

8 

4061 

8 

3752 

6 

2904 

10 

3553 

6 

3255 

6 

4246 

9 

4589 

9 

3474 

7 

3314 

N  1 

2607 

7 

3233 

7 

4282 

10 

3762 

10 

3556 

8 

3448 

2 

2591 

8 

3600 

8 

4118 

11 

2733 

W  3 

3181 

9 

3468 

3 

3042 

9 

3471 

9 

3928 

S  4 

3071 

4 

3163 

10 

3289 

4 

2450 

10 

3735 

S  1 

3095 

5 

3886 

9 

3733 

11 

2456 

5 

3444 

11 

3329 

2 

3218 

6 

3873 

10 

3823 

12 

3078 

6 

3593 

7,38  Two  lots  of  steers,  10  head  in  each  lot,  were  used  in  a  90-day  feeding 
trial.  Lot  1  received  standard  ration  A*  Lot  2  received  special  ration 
K.  Steers  on  ration  A  gained  1.84  Ibs.  per  head  per  day,  while  the  ani 
mals  fed  K  gained  at  the  rate  of  2.36  Ibs.  per  head  per  day.  Two 
questions  were  of  interest, 
(a)  Will  daily  gains  on  ration  K  exceed  2  Ibs.  per  day?  The  variance 

of  the  mean  gain,  2.36,  was  found  to  be  .0144. 
(6)    Is  ration  K  better  than  standard  ration  A  in  producing  gains? 

The  variance  of  the  mean  gain,  1.84,  for  lot  1  was  .0256;  thus, 
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7.39 


7.40 


7.41 


we  see  that  the  pooled  sum  of  squares  for  daily  gain  of  the  two  lots 

is  3.60, 

Answer  the  two  questions  with  the  information  given  above.  Why  do 
we  use  twice  the  pooled  variance  in  examining  the  difference  in  gains 
between  the  two  lots,  whereas  in  answering  question  (a)  we  use  the 
variance  without  such  modification? 

A  sample  of  rural  families  and  of  urban  families  was  taken  to  study 
differences  in  coffee  purchases  by  the  two  groups.  The  data  obtained 
are  listed  below  in  terms  of  pounds  per  family  purchased  annually. 


Family  No. 

Rural 

Urban 

1  

12.1 

8.3 

2.     ... 

6.8 

9.3 

3 

9.1 

9.2 

4  

11.1 

11.1 

5 

11.4 

10   7 

6.  . 

13.3 

4.6 

7  

9.8 

9.8 

8  

11.3 

7,9 

9       

9.4 

8.5 

10 

10.2 

9.1 

11  

9.7 

12  

6.2 

Would  you  attribute  the  difference  in  coffee  consumption  observed  in 
those  samples  to  normal  sampling  fluctuation,  or  is  there  a  real  dif 
ference  between  rural  and  urban  coffee  consumption?  Select  your  own 
level  for  control  of  the  Type  I  error  and  draw  your  conclusion  accord 
ingly*  What  is  the  specified  population  from  which  these  data  provide 
you  a  sample?  State  your  assumptions. 

It  has  been  suggested  that  the  resistance  of  wire  C  is  greater  than  the 
resistance  of  wire  D.  The  following  data  (in  ohms)  were  obtained  from 
tests  made  on  samples  of  each  wire : 


C 

D 

0.140 

0.135 

0.138 

0.  140 

0.143 

0.142 

0.142 

0.136 

0.144 

0.137 

0.139 

Assuming  that  trc*3*^?  test  (using  a«=0.0l)  the  hypothesis 
H:^c<fjL0  against  the  alternative  Aifj.c>fMI>.  State  your  conclusion 
and  interpret  the  results. 

If  the  estimate  of  the  population  standard  deviation  from  one  sample 
of  45  is  12,  and  a  corresponding  estimate  from  another  sample  of  45 
is  18,  arc  these  samples  consistent  with  the  hypothesis  that  they  are 
from  normal  populations  with  the  same  variance? 
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7.42  Two  methods  of  determining  moisture  content  of  samples  of  canned 
corn  have  been  proposed  and  both  have  been  used  to  make  determi 
nations  on  portions  taken  from  each  of  21  cans.  Method  I  is  easier  to 
apply  but  appears  to  be  more  variable  than  Method  II.  If  the  varia 
bility  of  Method  I  were  not  more  than  25  per  cent  greater  than  that  of 
Method  II,  we  would  prefer  Method  I,  Based  on  the  following  sample 
results,  which  method  would  you  recommend? 

ni  =  n2  =  21,  Pi  =  50,  F2  =  53,  &?=720,  S  2/1  =  340.  (Hint:  Test 
H :  af  =  1 .25<ri  against  A  :<ri>l. 25o-f .  Under  this  hypothesis 
(sf/1.25)/sl  is  distributed  as  F(VjV),  where  ^1=^2  =  ^1  —  I=n2 
—  1=20.) 

7.43  The  amount  of  surface  wax  on  each  side  of  waxed  paper  bags  is  be 
lieved  to  be  normally  distributed.  However,  there  is  reason  to  believe 
that  there  is  greater  variation  in  the  amount  on  the  inner  side  of  the 
paper  than  on  the  outside.  A  sample  of  25  observations  of  the  amount 
of  wax  on  each  side  of  these  bags  was  obtained  and  the  following  data 
recorded: 


Wax  in  Pounds  per  Unit  Area  of  Sample 


Outside  surface 

Inside  surface 

^=0,948 
]CX2-=91 

7  =  0.652 

2:1^=82 

Conduct  a  test  (using  a:  =  0.05)  of  the  hypothesis  HKTQ^O^  against 
the  alternative  A:o%  <cr|. 

7.44  Using  the  data  of  Problem  6,21   and  letting  a.  =  0.05,   test  H:o-1=cr2 
versus  A  :<ri  7^0-2. 

7.45  Using  the  data  of  Problem  6,22  and  letting  a:  =  0.01,   test  .Z7:crj.  =  cr2 
versus  A  :<ri  r^cr^ 

7.46  Using  the  data  of  Problem.  6.20  and  letting  01  =  0.01,  test  H:cri<cr2 
versus  A  :o-\  >crz. 

7.47  Using  the  data  of  Problem  6.23  and  letting  oi  =  0.05,  test  H:&A  <crB 
versus  A  :cr^  ><rB. 

7.48  A  child  psychologist,  analyzing  personality  differences  in  children  by 
a  protective  technique,  classified  the  responses  of  a  group  of  99  pre 
school  children  into  three  major  types:  static  form  of  response,  23; 
outer  activity,  51;  inner  activity,  25.  Do  these  data  differ  significantly 
from  a  chance  distribution  of  responses?  Use  a;  ==.01. 

7.49  A  random  sample  of  147  women  college  students  were  interviewed 
with  regard  to  their  habits  concerning  the  purchase  of  clothing.  The 
source  of  each  individual's  income   was  also  determined.   Given  the 
data  below  and  letting  a.  =  0.10,  test  the  hypothesis  that  women  pur 
chase  clothing  without  planning  in  the  following  proportion: 

Frequently 10  per  cent 

Seldom 80  per  cent 

Never 10  per  cent 
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Source  of  Income 

Numbers  Who  Purchased  Clothing  Items 
Without  Planning 

Frequently 

Seldom 

Never 

Earned  all  of  spending  money.  .  . 
Earned  part  of  spending  money  . 
Had  regular  allowance  , 

2 
8 
4 
15 

14 
17 
12 
25 

27 
5 
7 
11 

Money  given,  as  needed  

7.50 


7.51 


7.52 


7.53 


Referring  to  the  data  of  Problem  7,49  and  letting  oc  =  0.01,  test  the 
hypothesis  that  frequency  of  purchasing  clothing  items  without  plan 
ning  is  independent  of  source  of  income. 

An  experimenter  testing  three  chemical  treatments  applied  each  to 
200  randomly  selected  seeds,  and  then  conducted  germination  tests. 
The  following  results  were  obtained: 


Number 


Chemical 

Germinating 

Not 
Germinating 

A.  

190 

10 

B  

170 

30 

C  

180 

20 

Test  the  hypothesis  that  the  percentage  of  seeds  germinating  is  inde 
pendent  of  the  chemical  used. 

An  experimenter  feel  different  rations  to  three  groups  of  chicks.  As 
sume  that  the  chicks  were  assigned  to  the  rations  (groups)  at  random 
and  that  all  other  management  practices  for  the  three  groups  were  the 
same.  A  record  of  mortality  is  given  below.  Would  you  attribute  the 
differences  among  the  mortality  rates  of  the  three  groups  to  rations? 
Why? 


Ration 

Number 

Lived 

Died 

A  

87 
94 
89 

1$ 
6 
11 

B  

C  

I  selected  a  random  sample  of  students  at  Arizona  State  University 
and  asked  their  opinions;  on  a  proponed  radio  program.  The  results  arc 
given  below.  The  same  number  of  each  sex  waa  included  within  each 
class  group,  that  is,  freshmen  and  sophomores  each  consisted  of  100 
men  and  100  women,  while  juniors  and  seniors  each  consisted  of  50 
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men  and  50  women.  Test  the  hypothesis  tnat  opinions  are  independent 
of  the  class  groupings. 


7.54 


7.55 


7.56 


Num 

ber 

Class 

Favoring 
Program 

Opposed  to 
Program 

Freshmen          

120 

80 

Sophomores  

130 

70 

Tuniors                   

70 

30 

Seniors                      .... 

80 

20 

An  agency  engaged  in  market  research  conducted  some  of  its  sampling 
by  mail.  For  one  survey  these  results  in  terms  of  response  to  succes 
sive  mailings  were  obtained: 


Response  No.:          1st 


2nd 


3rd 


4th        Original    Mailing 


Returns: 


150 


60 


40 


20 


1000 


Another  agency  obtained  the  following  results  in  a  mail  sampling  of  a 
similar  population: 


Response  No,:          1st 


2nd 


3rd 


4th        Original    Mailing 


Returns: 


200 


30 


50 


25 


800 


Does  it  appear  that  the   two   mail   samplings  were  homogeneous  in 
eliciting  replies  from  the  two  populations? 

In  a  large  city  the  division  of  the  voting  strength  between  two  candi 
dates  for  mayor  appeared  to  be  about  equal.  The  campaign  manager 
for  candidate  A  polled  a  random  sample  of  2500  voters  two  weeks 
before  the  election.  In  this  sample  1313  of  the  voters  indicated,  they 
would  vote  for  A.  If  the  sample  is  representative  of  the  population  of 
voters  in  this  city,  is  it  likely  that  A  will  be  elected?  Establish  99  per 
cent  confidence  limits  for  the  proportion  of  voters  favoring  A.  ^ 
An  opinion-polling  agency  reported  the  distribution  of  a  sample  in  tins 
manner : 


Republicans 


Democrats 


Independents 


Total 


400 


450 


150 


1000 


A  newspaper  poll  in  the  same  area  yielded  this  distribution  in  terms 
of  declared  political  opinion  of  respondents :  


Republicans 
300 


Democrats  Independents  Total 

325  75  700 
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Are  these  two  samples  homogeneous  with  regard  to  division  of  politi 
cal  opinion? 

7.57      The  following  data  were  obtained  from  a  random  sampling  of  the  rec 
ords  of  a  specific  company. 

NUMBER  OF  BREAKDOWNS 


7.58 


7.59 


7.60 


7.61 


7.62 


Machine 

A 

B 

C 

D 

Total  per  Shift 

Shift  1 

10 

6 

12 

13 

41 

Shift  2 

10 

12 

19 

21 

62 

Shift  3 

13 

10 

13 

18 

54 

Total 
per  machine 

33 

28 

44 

52 
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Test  the  hypothesis  that  the  number  of  breakdowns  on  each  machine 
is  independent  of  the  shift.  Use  ct  =  0.05. 

Road  tests  gave  the  data  shown  below  regarding  tire  failures.  Letting 
<x  =  0.05,  test  the  hypothesis  that  left-right  tire  wear  is  independent  of 
front-rear  tire  wear. 

NUMBER  or  FAILURES 


Front 

Rear 

Totals 

Left 
Right 

115 
125 

65 
95 

180 
220 

Totals 

240 

160 

400 

A  car  rental  firm  has  a  particular  car  that  has  experienced  13  break 
downs  in  the  past  year.  Using  a  Poisson  distribution  and  letting 
<x  =  0.01,  test  H:jj,<lQ  versus  ^L:M>10. 

Ignoring  the  correction  for  continuity  in  Equation  (7.10),  that  is, 
dropping  the  adjustment  of  — 0.5,  show  that  x2  =  (a  —  rb)2/r(a  +  b) 
where  a  and  b  are  the  observed  numbers  in  the  two  classes  and  r 
equals  the  hypothesized  ratio  of  type  A  to  type  B. 

Work  the  preceding  problem  using  the  correction  for  continuity  and 
show  thatx2=(|a~-r&j  —  (r  +  l)/2)2/V(>+&). 

Rework  the  problems  noted  below,  using  the  method  described  in 
Section  7.20: 


(a)  7.26 

(6)  7.27 

(c)  7.28 

W)  7.30 


(e)  7.31 
(/)  7.37 
(<7)  7.39 


7.63     Given  the  following  data  (three  random  samples  from  three  normal 
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populations)    and    assuming    homogeneous    variances,    test    the    hy 
pothesis  jHr:jui=/z2=M3.  Let  o:  =  0.10. 


Sample  1  Sample  2  Sample  3 


48 

72 

48 

24 

24 

12 

36 

48 

24 

48 

7.64  Assuming  homogeneous  variances,  test  the  hypothesis  that  the  four 
normal  populations,  from  which  the  following  random  samples  were 
obtained,  have  the  same  mean.  Let  c*  =  0.025. 

Sample  1  Sample  2  Sample  3  Sample  4 


95 

45 

95 

20 

50 

40 

130 

55 

105 

95 

15 

50 

10 

65 

135 

80 

60 

45 

125 

7.65  Using  the  data  of  Table  7.23  and  assuming  homogeneous  variances, 
test  the  hypothesis  H:^i=jU2==M3=M4-  Let  a.  =  0.01. 

7.66  Using  the  data  of  Table  7.20,  test  the  hypothesis  H:<r?  =<r|  =o-§  ==oi. 
Let  <*=«0.05. 

7.67  Letting  OL  =  0.10,  test  the  hypothesis  of  homogeneous  variances  for 
each  of  the  following  problems:  (a)  7.63  and  (b)  7.64. 
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C  H  APTE  R    8 

REGRESSION  ANALYSIS 

THE  METHODS  OF  ANALYSIS  studied  thus  far  in  this  text  have  been  con 
cerned  with  data  on  only  one  characteristic  associated  with  the  experi 
mental  units.  That  is,  in  any  given  problem  we  have  been  working  with 
only  one  variable.  However,  as  you  will  realize,  many  problems  involve 
more  than  one  variable.  Consequently,  it  is  necessary  that  techniques 
developed  for  analyzing  multivariate  problems  be  studied.  Some  of 
these  techniques  will  be  investigated  in  this  chapter. 

8.1      FUNCTIONAL   RELATIONS  AMONG  VARIABLES 

When  we  possess  information  on  two  or  more  related  (or  concomitant} 
variables,  it  is  natural  to  seek  a  way  of  expressing  the  form  of  the 
functional  relationship.  In  addition,  it  is  desirable  to  know  the  strength 
of  the  relationship.  That  is,  not  only  do  we  seek  a  mathematical  func 
tion  which  tells  us  how  the  variables  are  interrelated,  but  also  we  wish 
to  know  how  precisely  the  value  of  one  variable  can  be  predicted  if  we 
know  the  values  of  the  associated  variables.  The  techniques  used  to 
accomplish  these  two  objectives  are  known  as  regression  methods  and 
correlation  methods.  Regression  methods  are  those  used  to  determine  the 
"best"  functional  relation  among  the  variables,  while  correlation 
methods  are  used  to  measure  the  degree  to  which  the  different  variables 
are  associated. 

More  specific  statements  will  be  forthcoming  in  succeeding  sections. 
For  the  moment,  it  will  suffice  to  say  that  the  functional  relationships 
will,  in  general,  be  represented  mathematically  by 

•  •  •  ,  Xp|0x,  •  •  •  ,O  (8.1) 


where 

77  =  the  response  (or  dependent}  variable 
Jft  =  the  ith  independent  variable  (i=l,  -  -  -  ,  p), 

0j=the  jfch  parameter  in  the  function  (j  =  1,  •••,?), 
and  <t>  stands  for  the  assumed  form  of  the  function.  Equation  (8.1)  is 
sometimes  written  as 

77  =  <£(^i,  •  •  •  ,  X*).  (8.2) 

When  this  abbreviated  form,  is  used,  one  should  always  remember  that 
the  parameters  belong  in  the  expression;  they  have  been  omitted  solely 
to  achieve  brevity.  In  the  language  of  statistics,  a  function  such  as 
specified  by  Equation  (8.1)  is  known  as  a  regression  function.  However, 
in  some  areas  of  application,  a  more  natural  (and  common)  expression 
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is  response  function.  In  this  book  both  expressions  will  be  used,  the 
choice  being  dictated  by  the  context. 

8.2  A  WORD  OF   CAUTION   ABOUT    FUNCTIONAL    RELA 
TIONS 

In  any  analysis  it  is  hoped  that  the  postulated  (assumed)  function 
represents  some  basic,  or  causal,  mechanism  associated  with  the  ex 
perimental  units  and  the  factors  under  investigation.  However,  science 
is  not  always  so  far  advanced  that  the  basic  variables  and  the  basic 
mechanisms  of  a  process  are  known  with  certainty.  In  such  cases,  the 
methods  of  regression,  and  correlation  may  still  prove  useful  as  analytic 
and  predictive  tools. 

Because  of  the  frequent  uncertainty  about  basic  variables  and  basic 
mechanisms,  a  word  of  warning  must  be  sounded  relative  to  the  inter 
pretation  of  analyses  involving  concomitant  variables.  This  warning 
is:  J^lst  because  a  particular  functional  relationship  has  been  assumed 
and  a  specific  computational  procedure  followed,  do  not  assume  that  a 
causal  relationship  exists  among  the  variables.  That  is,  because  a  func 
tion  has  been  found  that  is  a  good  fit  to  a  set  of  observed  data,  we 
are  not  necessarily  in  a  position  to  infer  that  a  change  in  one  variable 
causes  a  change  in  another  variable. 

In  summary,  the  only  person  who  can  safely  say  that  the  basic 
variables  are  those  used  and  that  the  basic  mechanism  operates  in 
accordance  with  the  selected  mathematical  function  is  a  pcrsoxi  well 
trained  in  the  subject  matter  field  in  which  the  experiment  was  per 
formed.  The  statistical  analysis  (in  this  instance,  a  regression  and/or 
correlation  analysis)  is  only  a  tool  to  aid  him  in  the  analysis  and  inter 
pretation  of  data. 

8.3  THE  CHOICE  OF  A   FUNCTIONAL    RELATION 

How  does  an  analyst  go  about  choovsmg  a  particular  functional  rela 
tionship  as  representative  of  the  population  under  investigation?  Two 
methods  arc  employed.  Those  are:  (1)  mi  analytical  consideration  of 
the  phenomenon  concerned,  and  (2)  an  examination  of  scatter  diagrams 
plotted  from  the  observed  data.  While  the  first  method  is  preferred, 
the  second  should  not  be  underrated.  If  little  is  known  about  the  basic 
mechanisms  involved,  the  use  of  scatter  diagrams  can  be  quite  helpful. 

8-4     CURVE   FITTING 

Once  we  have  decided  on  the  type  of  mathematical  function  that 
best  seems  to  fit,  or  represent,  our  concept  of  the  exact  relationship 
existing  among  the  variables,  the  problem  of  choosing  a  particular 
member  of  this  family  of  functions  arises.  That  is,  a  certain  function 
has  been  postulated  as  being  an  expression  of  the  true  state  of  affairs 
in  the  population,  and  it  is  now  necessary  to  estimate  the  parameters 
of  this  function.  The  determination  of  these  estimates  and  thus  the 
specification  of  a  particular  function  is  commonly  referred  to  as  curve 
fitting. 
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How  do  we  go  about  fitting  a  curve  to  a  set  of  data?  That  is,  how  are 
the  estimates  of  the  parameters  obtained?  Again  we  are  faced  with  the 
problem  of  choosing  among  several  methods  of  estimation.  The  ap 
proach  taken  should,  of  course,  provide  us  with  the  "best77  estimates, 
Since  this  is  simply  another  part  of  the  general  problem  of  estimation, 
the  criteria  by  which  estimators  are  selected  will  be  similar  to  those 
outlined  in  Chapter  6. 

8.5      THE    METHOD   OF    LEAST  SQUARES 

To  proceed  to  the  method  of  estimation,  it  should  be  noted  that 
there  are  several  methods  outlined  in  the  literature,  all  of  which  give 
acceptable  answers.  For  our  purposes  it  will  be  sufficient  to  discuss  the 
method  of  least  squares.  By  the  use  of  this  method  excellent  results  may 
be  obtained.  In  fact,  if  the  usual  assumption  of  normality  is  made,  the 
method  of  least  squares  becomes  equivalent  to  that  of  maximum 
likelihood. 

To  study  the  method  of  least  squares,  assume  that  we  are  consider 
ing  a  certain  characteristic  (77)  which  is  related  to  or  depends  on  certain 
other  characteristics  (X\,  •  •  -  ,  X^)  according  to  the  relationship 


97  =  <^(-X"i,  •  -  -  ,  2£p\  0i,  •  •  •  ,  6g).  (8.3) 

Both  the  form  of  the  function  and  the  values  of  the  parameters  must 
be  determined.  (NOTE:  In  practice,  the  form  is  usually  assumed  to  be 
known  and  thus  the  problem  reduces  to  estimating  the  parameter 
values.) 

The  reason  that  the  parameter  values  cannot  be  determined  without 
error  is  that  the  observed  values  of  the  dependent  variable  will  seldom 
agree  with  the  expected  values.  That  is,  even  if  we  can  control  the  X 
values  (or  measure  them  without  error)  the  observed  value  of  the 
dependent  variable,  denoted  by  Y,  will  not  equal  the  expected  value,  77. 
This  is  expressed  by 

Y  «  n  +  e  =  *(Xi,  ••-,  X,  |  0lf  •••,*,)+  e  (8.4) 

where  6  stands  for  the  error  made  in  attempting  to  observe  77.  Many 
factors  contribute  to  the  value  of  e,  but  it  seems  reasonable  to  as 
sume  that  it  (c)  is  a  random  variable  with  mean  0  and  variance  o-^. 
Under  these  conditions  we  must  be  content  with  estimating  the  un 
known  parameters,  namely,  the  0's  and  cr^. 

To  see  how  the  method  of  least  squares  operates,  consider  the  data 
of  Table  8.1.  Denoting  the  estimator  of  0/  by  §j(j=  1,  -  *  •  ,  g),  form,  the 
n  differences 


F2  -  *(X«,  •  -  •  ,  X,*  I  *!,  -  •  •  ,  0«)  -   F2  -   F2 
,  -  -  -  ,  Xpn  |  $x,  -  -  -  ,  0ff)  =  Yn  -   t 
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TABLE  8.1— Symbolic  Representation  of  n  Observations  on  p  +  1  Variables 


Dependent 

Variable 

Independent  Variables 

F 

Xi                         A  2         *  •  •          Xp 

Fi 

%      s      a 

Yn 

"\^                                   "V                                   V 

-<A,  ITT.                                    -A.  2n.                                    -^yn 

The  values  of  the  §j(j '  =  1,  •  *  •  7  #)  are  then  determined  by  minimising 
the  sum  of  the  squares  of  the  deviations  specified  by  Equation  (8.5). 
That  its,  the  §j  arc  found  by  minimizing 


- 


(8.6) 


This  is  a  familiar  problem  in  calculus:  *S  is  differentiated  with  respect 
to  each  of  the  estimators,  and  each  partial  derivative  is  set  equal  to  0. 
Symbolically, 


dS 


(8.7) 


and  this  system  of  equations  must  be  solved  for  the  estimates,  that  is; 
for  the  $/. 

8-6      GRAPHICAL     INTERPRETATION     OF    THE     METHOD 
OF   LEAST  SQUARES 

In  order  to  portray  graphically  the  concepts  of  the  method  of  least 
squares,  it  will  be  convenient  to  restrict  our«elve«  to  the  case  of  two 
related  variables,  This  is  because  cases  involving  three  or  more  vari 
ables  present  great  difficulties  in  graphic  presentation.  A  two-variable 
scatter  diagram  might  appear  as  in  Figure  8.1 


FfG.  8,1—  Example  of  a  scatter  diagram,  often 
referred    to    as    a    scattergrarru 

Suppose  that  the  functional  relationship  assumed  to  exist  between 
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77  and  X  is  that  specified  by  the  mathematical  model 

v  =  0!  +  62X.  (8.8) 

The  associated  statistical  model  is 

Y  =  0X  +  <923T  +  e.  (8.9) 

The  method  of  least  squares  would  then  be  used  to  obtain  a  regression 
(or  estimating  or  predicting}  equation.1  Since  Equation  (8.8)  is  the 
equation  of  a  straight  line,  the  regression  equation  would  also  turn 
out  to  be  a  straight  line  such  as  shown  in  Figure  8.2.  The  resulting 

Y 


FIG.   8.2—  Example  of  a  scatter  diagram  with  a  straight  line   inserted 

showing    the    vertical    deviations    whose    sum    of  squares    is    to    be 

minimized    by    the    proper    choice    of    straight    line. 

regression  equation  would  be  denoted  by 

Y  =  0!  +  0*X  (8.10) 

where  8j  estimates  Oj(j=  1,  2)  and  Y  estimates  both  Y  and  77. 

If  a  more  complicated  functional  relationship  had  been  assumed, 
for  example, 

(8.11) 


the  problem  would  be  somewhat  more  complex  to  handle  mathemati 
cally  bxit  the  principle  would  be  unchanged.  The  parameters  would  still 
be  estimated  by  minimizing  the  sum  of  the  squares  of  the  vertical 
deviations  about  the  appropriate  curve.  The  reader's  attention  is 
directed  to  Figure  8.3  for  an  illustration  of  a  situation  associated  with 
the  mathematical  model  specified  in  Equation  (8.11). 

Y 


X 


FIG,   8,3— Example  of  a  scatter  diagram  with   a   second   degree  polynomial 

inserted    showing    the    vertical    deviations    whose    sum    of    squares    is 

to    be    minimized    by    the    proper    choice    of    parabola. 

1  The  terms  regression  equation,   estimating  equation,   and   prediction  equation 
are  used  interchangeably. 
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8.7      SIMPLE    LINEAR    REGRESSION 

Let  us  now  consider  in  detail  the  case  in  which  the  postulated  func 
tional  relationship  is  of  the  form2 

77  -  /50  +  ftiX  (8.12) 

or 

F  =  fto  +  foX  +  e.  (8.13) 

The  problem  is,  of  course,  to  estimate  (3o  and  /?i  from  the  observed 
sample  data.  That  is,  estimates  of  /30  and  ft,  denoted  by  60  and  61, 
must  be  found.  Using  the  method  of  least  squares,  the  estimates  60 
and  61  are  determined  by  solving  the  normal  equations* 


The  solutions  are 


rs  i  ^ 

** 

and 

i0  =  7  —  j^  (8.16) 

where  a;  =  -X"  —  5T  and  y==  F  —  7\  These  estimates  are  then  used  to  give 
\is  the  regression  equation 

f  «  J0  +  i^.  (8.17) 

The  presentation  of  a  worked  example  will  be  deferred  until  Section  8.9 

8.8      PARTITIONING  THE  SUM  OF  SQUARES  OF  THE  DE 
PENDENT  VARIABLE 

Regression  computations  may  also  be  looked  upon  as  a  process  for 
partitioning  the  total  sum  of  squares,  ^,Y*,  into  three  parts,  each  of 

*  The  symbols  $o>  ft,  60,  and  61  arc  \isod  here  rather  than  0o»  #i»  £o,  and  0\.  This 
ia  in  conformanco  with  general  usage.  Some  authors  prefer  ctf  ft,  a,  and  6  rather 
than  #g,  A,  baj  and  61.  Howovor,  the  latter  aro  hotter  suited  to  extensions  to  three 
or  raoro  variables. 

*  The  phraeo  "normal  oquationfl"  IB  \xsed  to  describe  the  equations  roaxilting 
from  the  least  gquaros   differentiation.   It  has  no  connection  with  the   normal 
distribxition, 
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which  is  meaningful  and  useful.  Prior  to  this  chapter,  the  total  sum  of 
squares  was  shown  to  be  the  sum  of  two  quantities,  the  corrected4 
sum  of  squares  and  the  correction  for  the  mean  : 

Total  S.S.  ==   Correction  for  the  mean  +  Corrected  S.S. 

=  S.S.  due  to  the  mean  +  S.S.  of  deviations  about  the  mean.    (8.18) 
That  is, 

2.  (3-19) 


Using  regression  methods,  the  corrected  sum  of  squares  may  also  be 
subdivided  into  two  parts,  the  sum  of  squares  due  to  (simple  linear) 
regression  and  the  sum  of  squares  of  the  deviations  about  regression: 

Corrected  S.S. 

=  S.S.  due  to  regression  +  S.S.  of  deviations  about  regression.      (8  .  20) 
That  is, 


Substituting  Equation  (8.21)  in  Equation  (8.19)  gives 

]T)  F2  =  (Z  YY/n  +  6x  2:  xy  +  Z  (F  -    *")2-         (8.22) 
Expressing  this  in  words, 

Total  S.S.  =   (S.S.  due  to  the  mean)  +  (S.S.  due  to  regression) 

+  (S.S.  of  deviations  about  regression).  (8.23) 

More  properly,  this  result  should  be  stated  as 

Total  S.S.  =  (S.  S.  due  to  60)  +  (S.S.  due  to  Z>i  1  60) 

+  (Residual  S.S.).  (8.24) 

A  more  extensive  discussion  of  this  type  of  manipulation  and  of  the 
associated  notation  is  given  in  Sections  8.15  and  8.16.  For  the  present 
it  is  recommended  that  the  reader  make  an  effort  to  learn  the  notation 
and  the  manipulative  skills  involved.  The  acquisition  of  such  knowl 
edge  will  prove  most  helpful  in  the  remainder  of  this  book. 

Graphically,  each  of  the  indicated  partitions  of  the  total  sum  of 
squares  can  be  associated  with  the  sums  of  squares  of  segments  of  the 
F-ordinates.  This  is  illustrated  in  Figure  8.4,  where  the  ordinate  YQ, 
associated  with  Xe,  is  partitioned  according  to  the  identity 


Y  -   Y  +  (f  -   Y)  +  (Y  -   ?)•  (8,25) 

In  words,  this  says  that 

4  The  expression  "corrected  sum  of  squares'7  is  used  to  represent  the  total  sum 
of  squares  minus  the  adjustment  (or  correction)  for  the  mean.  That  is,  it  is 
simply  a  synonym  for  the  sum  of  the  squares  of  the  deviations  about  the  mean. 
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F(G.  8.4— Diagram  to  illustrate  the  partitioning  of 
the    total    sum    of   squares. 

Observed  Y  =  (Contribution  due  to  the  mean) 

+  (An  additional  contribution  due  to  regression) 
+  (Deviation  from  regression). 


(8.26) 


If  we  carry  through  the  algebraic  manipulations  without  error,  Kqua- 
tion  (8.22)  may  be  derived  from  Equation  (8.25)-  The  proof  of  this  is 
left  as  an  exercise  for  the  reader. 

When  each  partition  of  23^2  is  associated  with  a  corresponding  por 
tion  of  the  total  degrees  of  freedom,  the  technique  is  known  as  analysis 
of  variance.  Such  results  are  usually  presented  in  tabular  form, 
referred  to  as  an  analysis  of  variance  table.  This  is  illustrated  in 
Table  8.2.  The  first  line  of  this  table  is  frequently  omitted  and  the 
total  line  expressed  as  ]C//2sssa  S^2 — (5D^)2/n'  with  n— 1  degrees  of 
freedom.  However,  in  this  book  the  results  will  always  be  presented  in 
the  form  unod  in  Table  8.2. 

TABLE  8.2— General  Analysis  of  Variance  for  Simple  Linear  Regression 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of  Squares 

Mean  Square 

Due  to  &o  

Due  to  &i|  60-  .  - 
Residual      .  ,  ,  , 

1 
1 
n  —  2 

(  r  YY/U 

b\  23  xy 

V  (Y  —  J>)2 

(  z;  nv» 

61  H  *y 
y;  fy  _  #•)«/(«  —  2) 

Total 

71 

T\  K2 

8.9       A    PRACTICAL    EXAMPLE 


167 


8.9     A  PRACTICAL   EXAMPLE 

To  illustrate  the  methods  discussed  in  the  preceding  sections,  con 
sider  the  data  in  Table  8.3.  Following  the  methods  outlined  in  the 

TABLE  8.3-Schopper-Riegler  Freeness  Test  of  Paper  Pulp  During  Beating 


Hours  of 
Beating 
X 

Schopper-Riegler 
(in  degrees} 
Y 

Hours  of 
Beating 
X 

Schopper-Riegler 
(in  degrees} 
Y 

1 

17 

8 

64 

2 

21 

9 

80 

3 

22 

10 

86 

4 

27 

11 

88 

5 

36 

12 

92 

6 

49 

13 

94 

7 

56 

Source:  O.  L.  Davies,  Statistical  Methods  in  Research  and  Production,  Oliver  and  Boyd, 
Edinburgh,  1949,  p.  161.  By  permission  of  the  author  and  publishers. 


preceding  sections,  we  obtain 

1360  +     916i  =     732 
916o  +  8196i  =  6485 

which  yields  1?  =  3*962+7A78X  as  the  regression  equation.  It  is  also 
observed  that 

F2  =  51,712 
Y»  =  41,217.23 

>2  =  10,494.77 

y  =  10,177.59 

^ 317.18 

and  these  results  are  presented  in  analysis  of  variance  form  in  Table 
8.4.  (NOTE:  A  convenient  form  to  use  in  performing  the  calculations 
is  given,  in  Table  8.5.) 

The  estimated  function  is  pictured  in  Figure  8.5.  Examination  of 
Figure  8.5  will  suggest  that  a  cubic  equation  (i.e.,  a  third-degree 
polynomial)  would  be  a  better  fit  to  the  observed  data.  However,  a 
discussion  of  the  desirability  or  appropriateness  of  fitting  a  different 
function  and  of  methods  of  fitting  other  than  a  simple  linear  function 
will  be  deferred  until  later. 
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TABLE  8.4-Analysis  of  Variance  of  the  Schopper-Riegler  Data  of  Table  8.3 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of  Squares 

Mean  Square 

Due  to  60  

1 

41,217.23 

41,217.23 

Due  to  b\  |  60  

1 

10,177.59 

10,177.59 

Residual    

11 

317.18 

28.83 

Total 

13 

51,712.00 

TABLE  8.5-Suggested  Form  for  Calculation  and  Presentation  of  Results 

in  Simple  Linear  Regression 


n  = 

x  - 


Z  F  - 
7  «= 


^c;— «        2    


Z  XY 

Z  ry« 

Z  xy 


Z  -^2 


Source  of 
Variation 


Due  to  60- 
Due  to  bi\ 
Residual  . 


Degrees  of 
Freedom 


Sum  of 
Squares 


Mean 
Square 


Total 


8.10  ASSUMPTIONS  NECESSARY  FOR  ESTIMATION  AND 
TESTING!  HYPOTHESES  IN  SIMPLE  LINEAR  RE 
GRESSION 

Before  we  can  construct  confidence  Intervals  or  specify  teat  pro 
cedures,  certain  assumptions  are  generally  made.  Thus,  in  addition  to 
the  assumption  of  "no  error' *  in  the  independent  variable  made  in 
Section  8.5,  the  usual  assumptions  are: 
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(1)  For  a  given  X,  the  Y's  are  normally  and  independently  dis 
tributed    about    a    mean    nY\x  =  r)  =  /3Q+/3iX    with    variance 
aY\x~air-x-  The  assumption  concerning  the  mean  is  equiva 
lent  to   assuming  that  e  is  normally  and  independently  dis 
tributed  with  mean  0,  where  e  is  defined  by  Equation  (8.13). 

(2)  The  variance  o^lx  is  the  same  for  each  X  and  can  therefore 
be  denoted  by  o-^.  The  subscript  E  is  used  because  o^,  is  the 
variance  of  the  errors  denoted  by  e.  It  (cr|.)  is  commonly  re 
ferred  to  as  the  variance  of  the  "errors  of  estimate." 

The  preceding  assumptions  are  summarized  by 


V  =  /304- 


i=  1, 

3=  1, 


,  * 


(8.27) 


where 


ii  =  the  number  of  values  of  Y  associated  with  the  ith  value  of  X, 
it  =  n  =  the  total  number  of  values  of  F  (or  of  X),  and  the  et-y 


*— 1 


are  normally  and  independently  distributed  with  mean  zero  and  stand 
ard  deviation  0-%.  This  last  phrase  is  frequently  abbreviated  to  "the 
are  NID  (0, 


10O 


80 


^.  _ 
ujo:6O 

ceD 


Q-  — 
O 

§       20 


3.962+  7  478  X 


J  X 


O  2  4  6  8  1O  12 

HOURS   OF    BEATING 

FIG.  8.5— Plot  of  data  in  Table  8.3  with  the  least  squares  line  inserted 
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8  11      ESTIMATES  OF  ERROR  ASSOCIATED  WITH  SIMPLE 
LINEAR    REGRESSION   ANALYSES 

Granting  the  assumptions  of  Sections  8.5  and  8.10,  and  further  as 
suming  that  the  failure  of  the  assumed  model  to  fit  the  observations 
exactly  is  solely  a  function  of  the  errors,  the  mean  square  for  devi 
ations  about  regression  (i.e.,  the  residual  mean  square)  can  be  used  as 
an  estimate  of  a-%.  Symbolically, 

residual  mean  square  =  ]£  (F  -   Y}*/(n  -  2)  =  4.         (8.28) 


We  must  always  remember,  though,  that  such  an  estimate  can  be 
badly  inflated  if  the  assumed  mathematical  model  is  inadequate.  More 
will  be  said  on  this  subject  a  little  later. 

Once  we  have  determined  the  variance  estimate  $%,  it  is  a  straight 
forward  matter  to  obtain  estimates  of  the  variances  of  various  statistics 
calculated  in  the  regression  analysis.  Those  of  general  interest  will  be 
given  without  derivation. 

Estimated  variance  of  the  regression  coefficient  &i 

2  2    /  x    ^       o  /'Q     O  Q"\ 

o         —- ^     c      /    s       3C*t  \O  .  J-t  i'y 

Estimated  variance  of  estimated  'mean  Y  for  given  X 

(8.30) 


Estimated  variance  of  predicted  individual  Y  for  given  X 


Estimated  variance  of  the  regression  coefficient 


These  may  then  be  used  to  test  hypotheses  about,  or  to  provide  interval 
estimates*  of  ,  various  unknown  parameters. 

8.12     CONFIDENCE     AND     PREDICTION      INTERVALS      IN 
SIMPLE   LINEAR    REGRESSION 

In  most  linear  regression  problems  the  estimator  of  greatest  impor 
tance  is  the  slope  61.  This  is,  of  course,  an  estimator  of  0X.  To  provide 
a  lOOy  per  cent  confidence  interval  of  /3i,  we  compute 


-  ii  =F 
where  $6l  is  defined  by  Equation  (8.29). 
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Using  the  data  in  Table  8.3  and  the  results  of  Section  8.9,    it  may 
be  verified  that 


Example  8.1 

he  data  in 
that 

s^  =  317.18/11  =  28.83 
and 

4x  =  28.83/182  =  0.1584. 

Thus,   a   95   per   cent   confidence   interval  for  /Si  is   determined   to   be 
(6.602,  8.354), 

If  a  100-y  per  cent  confidence  interval  estimate  of  0o  is  needed,  we 
have  only  to  compute 

L  _ 

=    ft       +    *-2»-2*5  (8.34) 


where  s&o  is  defined  by  Equation   (8.32).  No  example  will  be  given; 
however,  some  of  the  problems  will  require  use  of  Equation  (8.34). 

It  is  also  possible  that  we  might  wish  to  determine  a  confidence 
region  for  the  simultaneous  estimation  of  /Jq  and  jSi.  Making  use  of  the 
fact  that 


is  distributed  as   tf*      and:  that  (n  —  2)sJ/<r^  is  distributed  as  x^n_2)  it 
is  seen  that 


is  distributed  as  F  with  ?i  =  2  and  ^2  =  '^  — 2  degrees  of  freedom.  The 
boundary  of  the  lOOy  per  cent  confidence  region  is  then  determined 
by  solving 

[w(6o  —  /3o)2  +  2nX(bQ  —  /30)(&i  —  £1)  +  (&i  -  £i)2  23  x*\/2sl 

===     Fy(2,n — 2)          (8.37} 

for  /So  and  /Si. 

Another  estimation  problem  of  importance  in  simple  linear  regres 
sion  is  associated  with  5^  =  6o  +  &iX\  As  you  will  remember,  3?"  =  &o  +  &i-X" 
is  an  estimator  of  /xrjX==^  =  /30  +  jS1-X".  Further,  by  the  assumptions  of 
Section  8-10,  77  is  the  mean  of  a  normal  population.  Thus,  it  should  not 
be  surprising  that  a  lOOy  per  cent  confidence  interval  estimate  of  77  is 
provided  by 

\   -=  •&  :r  /  *~  (8  38) 

rrl  ^      [(1-H-Y)  /2]  (n—  2)°F  V0-^0/ 

C// 

where  s$>  is  defined  by  Equation  (8.30). 

It  should  be  noted  that  ^  =  &o+&i^  is  also  a  predictor  of  F"  — /30  + 
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That  is,  Y  can  also  be  used  to  predict  an  individual  Y-value 
associated  with,  a  given  X- value.  (NOTE:  This  is  in  contrast  to  the 
preceding  paragraph  where  ¥  was  used  to  estimate  the  mean  of  a  nor 
mal  population.)  When  !F  is  used  to  predict  an  individual  value  rather 
than  a  mean  value,  a  100-y  per  cent  prediction  interval  is  provided  by 


L>\- 

Uf] 


(8.39) 


where  s$  is  defined  by  Equation  (8.31). 

The  nature  of  the  confidence  and  prediction  intervals  specified  by 
Equations   (8.38)   and   (8.39)   is  illustrated   in   Figure   8.6.    The   most 


FIG.  8.6— Graphical  representation  of  the  confidence  and  prediction 
intervals  specified  by  Equations  (8.38)  and  (8,39). 


noticeable  feature  of  Figuro  8.0  is  the  curvature  of  the  confidence  and 
prediction  limits.  That  is,  our  estimates  arc  most  precise  at  the  average 
value  of  X  and  may  bo  almost  useless  at  values  of  X  far  removed  f rom 
j?.  By  "almost  useless/'  we  mean  that  the  confidence  and  prediction 
intervals  may  turn  out  to  be  so  wide  an  to  render  them  of  little  value. 
To  state  the  preceding  conclusion  in  a  positive  rather  than  a  negative 
fashion,  any  estimate  of  the  mean  value  of  Y  for  a  given  X  or  any 
prediction  about  an  individual  Y  associated  with  a  given  X  will  be 
mont  meaningful  for  those  values  of  X  near  3?. 

As  a  corollary  to  the  preceding  paragraph,  it  is  clear  that  if  the  esti 
mation  of  £o  is  of  prime  importance,  the  values  of  X  should  have  been 
selected  (ptior  to  collecting  data)  BO  that  3?=«0.  The  reason  for  this 
statement  should  be  clear.  It  is,  of  course,  that  by  so  choosing  the  X- 


8.12       CONFIDENCE    AND    PREDICTION    INTERVALS 


173 


values,  the  narrowest  confidence  and/or  prediction  interval  will  occur 
at  X  =  0,  and  it  is  at  this  value  of  X  that  Y  =  bo. 

Following  up  the  line  of  thought  started  in  the  preceding  paragraph, 
one  might  wonder  if  choosing  the  X- values  in  accordance  with  the  ex 
pressed  recommendation  is  best  for  all  purposes.  For  example,  if  the 
estimation  of  pi  rather  than  /50  is  of  prime  importance,  jshould  the 
values  of  the  controlled  variable  still  be  selected  so  that  X  =  0?  The 
answer  is,  "Definitely  not."  If,  then,  our  only  interest  lies  in  £1  (and  it 
frequently  does),  how  should  the  values  of  X  be  chosen?  In  this  case 
the  appropriate  recommendation  is:  select  two  values  of  X  (as  far 
apart  as  is  reasonable)  and  obtain  random  observations  on  the  Y- 
variable  at  only  those  two  X- values.  By  following  this  rule,  the  standard 
error  of  61  will  be  made  as  small  as  possible  subject  to  the  (uncon 
trollable)  magnitude  of  SE.  In  other  words,  if  we  proceed  as  indicated, 
the  confidence  interval  for  0i  should  be  kept  "small."  (NOTE:  The 
reader  can  verify,  heuristically,  the  wisdom  of  this  approach  by 
noting  that  widely  divergent  -XT- values  will  increase  ^C#2,  the  denomi 
nator  of  Equation  (8.29),  and  thus  decrease  the  size  of  s^.) 


FIG.  8.7—  Illustration  of  the  danger  of  extrapolation. 

Another  fact  which  should  not  be  lost  sight  of  is  that  predicting 
values  of  Y  for  a  given  X  value  is  even  more  hazardous  than  already 
indicated  if  we  attempt  such  a  procedure  for  an  X  value  outside  the 
range  of  the  chosen  values  of  X  used  in  obtaining  the  sample  regression 
line.  That  is,  extrapolation  beyond  the  observed  range  of  the  independ 
ent  variable  is  very  risky  unless  we  are  reasonably  certain  that  the 
same  regression  function  does  exist  over  a  wider  range  of  X- values 
than  we  have  in  our  sample.  A  simple  illustration  will  suffice  to  point 
out  the  possible  trouble.  Suppose  we  have  values  of  X  and  Y  which 
plot  (see  dots)  as  in  Figure  8.7.  In  the  given  range  of  -XT,  a  straight  line 
appears  to  be  a  good  fit  to  the  data  and  we  might  be  tempted  to  project 
our  regression  line  farther  in  both  directions.  However,  it  is  entirely 
possible  that  if  we  had  chosen  a  wider  range  of  -XT-values  and  observed 
the  associated  F-values  (see  circled  dots),  a  second  degree  polynomial 
might  have  been  indicated  as  the  true  form  of  the  regression  function 
rather  than  the  straight  line  we  have  drawn.  You  can  readily  see  that 
predicting  values  of  Y  using  an  extrapolation  of  the  straight  line  could 
lead  to  serious  errors.  Therefore,  the  research  worker  is  advised  to 
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act  with  caution  whenever  he  makes  predictions  which  involve  going 
outside  the  observed  range  of  the  independent  variable. 

One  further  remark  and  we  shall  proceed  to  the  subject  of  testing 
hypotheses.  Although  only  two-sided  confidence  and  prediction  limits 
have  been  discussed  in  this  section,  the  reader  will  realize  that  one- 
sided  limits  should  be  used  if  the  problem  calls  for  such  a  procedure. 
If  only  an  upper  or  lower  limit  is  required,  the  researcher  should  make 
the  same  changes  in  procedure  as  outlined  in  Chapter  6  but  continue, 
of  course,  to  use  the  statistics  specified  in  Section  8.11. 

8.13     TESTS    OF    HYPOTHESES    IN    SIMPLE    LINEAR    RE 
GRESSION 

Sometimes  the  researcher  is  interested  in  determining  whether  the 
estimated  slope  61  is  significantly  different  from  some  hypothesized 
value  of  /Si^say  /5(,  That  is,  he  wishes  to  test  the  hypothesis  H:fBi  =  f3{ 
against  the  alternative  A  :/3i^/3£.  The  appropriate  test  statistic  is 


where  sb  is  defined  by  Kquation  (8.29).  The  hypothesis  //  would  then 
be  rejected  if 

t    =>1    £<l-a/2)(n-2>  (8.41) 

or  if 

t  <  —  £(i—  «/2)<tt—  2).  (8.42) 

A  common  value  of  /5{  is  0  since  this  reflects  the  hypothesis  that  Y  is 
independent  of  X  (in  a  linear  sense)  ;  that  is,  that  X  is  of  no  value  in 
predicting  Y  if  a  linear  approximation  is  used. 

Example  8.2 

Referring  to  Example  8.1  and  letting  a  =  0.01,  test  the  hypothesis 
//:#!«().  Calculation  yields  £=  (7.478  —  0)/0.»08  «  18.788  >*0.o9fc(n> 
=  3.106.  Thus,  the  hypothesis  is  rejected. 

It  is  worth  noting  that  the  tent  of  the  hypothesis  //:$i  —  0  can  also  be 
performed  directly  from  the  analysis  of  variance  table.  To  illustrate 
the  nature  of  this  alternative  procedure,  consider  Table  8.G.  Because 
of  the  assumptions  made  in  Section  8.10,  It  is  possible  to  demonstrate 
that  the  expected  values  of  two  of  the  mean  squares  are  as  shown  in 
Table  8.6.  Thus,  if  //:/?a  =  0  is  true,  both  the  "mean  square  due  to 
&IJ.&D"  and  the  "residual  moan  square77  are  estimates  of  the  same  quan 
tity,  namely,  <r^/It  seems  logical,  therefore,  to  examine  the  ratio 

i  mean  square  due  to  bi    &o 

77  _  „_  ._  _rw  ™  ____  '  (& 

f.        i  A  f  "  ""'""       "  9  ^ 

,/  residual  mean  square 

and  if  this  ratio  is  significantly  larger  than  1,  some  doubt  would  be 
cast  upon  the  validity  of  the  hypothesis  //:#L*=0.  Since  it  may  be 
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TABLE  8.6— General  Analysis  of  Variance  for  Simple  Linear  Regression 
Showing  the  Expected  Mean  Squares 


Source  of 
Variation 

Degrees 
of  Free 
dom 

Sum  of 
Squares 

Mean  Square 

Expected  Mean 
Square 

Due  to  60  

1 

(  2Z  Y)2/n 

(  51  V)2/n 

Due  to  bi\b0.  . 
Residual  

1 

n  —  2 

t>i  2Z  xy 

52  (F  -  F)2 

bi  2Z  xy 

™*»iv_        ^       sr 

^  =  V   (F  —   Y)2/(n  —  2) 

«jr  +  J&?  S  *» 
ffl 

Total 

n 

5Z  Y* 

demonstrated  that  the  F-ratio  specified  by  Equation  (8.43)  follows  an 
^-distribution  with  v\  =  1  and  v2^=n  —  2  degrees  of  f  reedora,  the  hy 
pothesis  H:/3i  =  Q  will  be  rejected  if  F^ 


n_2). 
Example  8.3  ^u        /- 

Referring  to  Table  8.4,  it  is  seen  that  F  =  10,177.59/28.83  =  353.02. 
Since  this  exceeds  Fo.  99(1,11)  =  9.65,  the  hypothesis  .fiT:/3i  =  0  is  re 
jected.  This  is  the  same  conclusion  reached  in  Example  8.2.  (NOTE:  the 
fact  that  t%—F(i,v)  is  the  connecting  link  between  the  equivalent  test 
procedures.) 

Other  test  procedures  in  simple  linear  regression  are  concerned  with 
such  hypotheses  as: 


(1)  H:/3Q  =  /3o; 

(2)  H:vYlx~x0 

(3)  H:/3Q  =  j3o 


and     fa  = 


Rather  than  discuss  these  in  detail,  we  shall  only  indicate  the  appro 
priate  test  procedures.  For  the  three  cases  just  mentioned,  the  respec 
tive  test  statistics  are: 


where  sbo  is  defined  by  Equation  (8.32), 


(8.44) 

(8.45) 
Q,  and 

(8.46) 

The  test  procedures  are  summarized  in  Table  8.7. 

Again  we  shall  do  no  more  than  remind  the  reader  of  the  possibility 
of  one-sided  test  procedures.  By  this  time  the  method  of  procedure  in 


where 
p  = 


is  defined  by  Equation  (8.30)  and  evaluated  at  X  = 
0  -  pfo*  +  2nX(b0  -  / 
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TABLE   8.7— Summary  of  Test  Procedures  in  Simple  Linear  Regression 


Hypothesis 

Statistic 

Equation 

Rejection  Region 

00    =   00 

t 

8.44 

/,             ->           f,f\  Q.  /*>\    fn              ft\ 

or 

1    ^    ""    ^(L^/2)Cn-2) 

01    =    0{  

t 

8.40 

t    >    t  i 

or 

^    5=    —    ^U—  a/2)(n—  2) 

=  u 

t 

8.45 

t    >    * 

J  |  «X"«"«<X'o             0 

—         kL       "/  •*/  ^T*       */ 

or 

^    5:    —    ^(1—  «/2)  (n—  2) 

0o  =  0o  and  0!  =  0(  

F 

8.46 

jC              ^"          £*     (lu     .    ft)   (^      71             *») 

such  cases  should   be  obvious  once  the   two-tailed  tests  have  been 
specified, 

8.14      INVERSE  PREDICTION  IN  SIMPLE  LINEAR  REGRES 
SION 


The  equation  f^  =  &o+&i-XT  may  sometimes  be  used  to  estimate  the 
unknown  value  of  X  associated  with  an.  observed  Y  value.  For  example, 
suppose  that  in  addition  to  the  data  of  Table  8,3,  we  have  a  Schopper- 
Riegler  reading  of  K  =  60  but  the  hours  of  beating,  X,  are  unknown. 
How  shall  this  unknown  value  be  estimated?  The  procedure  is  as  fol 
lows.  Compute 


«=  (Fo  -  60)  /6  1 


(8.47) 


where  F0  is  the  observed  value  of  Y  for  which  we  desire  to  estimate  the 
associated  X  value.  A  lOOy  per  cent  confidence  interval  for  the  true 
but  unknown  X  value  is  defined  by 


L] 

U) 

where 


and 


+ 


_ 
.  -  T).  + 


"n?«±lY 

^V     n     ) 


(8 . 48) 


(8.49) 
(8.50) 


_«>•  (8.51) 

If,  as  in  frequently  tho  case,  one  has  several  (say  m)  values  of  Y 
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associated  with  the  unknown  X,  Equations  (8.47)  and  (8.48)  are  modi 
fied  to  read 

X  =  (To  -  5o)/6i  (8.52) 

and 


U)  D  D  'V  \     nrn 

where 


{8.M) 


(8.54) 


(»-  2)4+  ZICFo,-   F0) 


(*',)  2  =  -  -  ,  (8.55) 

n  +-  m  —  3 

t    =    ^(H-T)/2](n+7n-3),  (8.56) 

and  B  and  D  are  the  same  as  before.  Since,  in  practice,  m  is  usually 
quite  small  relative  to  n,  the  computational  labor  may  be  reduced 
materially  by  using  s%  rather  than  (s^)2.  This  leads,  of  course,  to  an 
approximate  solution  but  one  which  is  sufficiently  accurate  for  most 
situations. 

Example  8.4 

Consider  the  problem  posed  at  the  beginning  of  this  section.  Using 
Equation  (8.47)  and  the  results  of  Section  8.9,  we  obtain  j£  =  7.494, 
Using  Equations  (8.48)  through  (8.51),  a  95  per  cent  confidence  interval 
estimate  is  determined  to  be  (5.85,  9.15). 

8.15     THE  ABBREVIATED   DOOLITTLE   METHOD 

Before  proceeding  to  the  consideration  of  regression  problems  of 
greater  complexity  than  simple  linear  regression,  let  us  digress  long 
enough  to  study  the  mechanics  of  a  method  for  solving  a  set  of  simul 
taneous  linear  equations  whose  coefficients  form  a  symmetric  matrix. 
This  digression  will  be  well  worth  while  for  several  reasons,  namely  : 

(1)  In  most  regression  problems,   the  postulated  mathematical 
model  is  linear  in  the  unknown  coefficients. 

(2)  The  method  is  well  suited  to   programming  for  high-speed 
computers  as  well  as  being  useful  when  only  desk  calculators 
are  available. 

(3)  It  incorporates  self-checking  features  which  permit  verifica 
tion  of  the  accuracy  of  the  arithmetic  calculations  at  each 
stage. 

Several  methods  of  solving  sets  of  simultaneous  linear  equations  (or 
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of  inverting  matrices)  appear  in  the  literature.  Each  of  these  is  a  vari 
ation  of  a  basic  procedure,  the  variations  being  introduced  to  accom 
plish  a  particular  aim  of  the  person  proposing  the  special  technique. 
In  this  book,  only  one  method  will  be  discussed.  It  is  known  as  the 
Abbreviated  Doolittle  Method. 

To  illustrate  the  Abbreviated  Doolittle  Method,  let  us  consider  the 
problem  of  estimating 

77  =  0oXQ  +  faXt  +  fcXz  +  /3*Xz  +  /?4^4  (8.57) 

where  XQ  is  a  dummy  variable  which  always  takes  the  value  1  (i.e,, 
JXTo^l).  The  method  of  least  squares  would  lead  to 

F  =  6 

and  the  coefficients 
equations 

(Z  A"o-Y0)fto  +  (Z 


(8.58) 
=  0,  1,  2,  3,  4)  would  be  found  by  solving  the 


(23  -^^0)60  +  (23 

(Z  -X-aA'0)6o  +  C23 
(23  -Y8AT0)6o  +  (23 
(23  Ar4Jr0)6o  +  (23 


i  +  (23 


i  +  (23 


0^4)  *4  ==  23  ^ 

+  (23  ^i.v8)&n 
1-^064  =  23  -v 

+  (23  ^aA^)J3 

r^064  =-  23 

+  (23  ^a-Yg)^ 

varoj4  =  23  Ar 


(8,59) 


t  4-  (Z  -^4X2)62  +  (Z  Xt 


=     Z 


If  the  data,  consisting  of  n  observations  on  each  of  the  variables,  are 
written  in  matrix  form,  namely, 


Y  = 


LFJ 


and   X 


-Y 


l2 


then  Equations  (8,59)  may  be  written  as 

=  X'Y 


(8.60) 

where  S'  =  [60,  &i?  62,  b%,  b*].  To  simplify  the  writing,  we  shall  denote 
X* X  by  A  and  XfY  by  G.  In  this  notation  Equation  (8.60)  appears  as 


AB  =: 


(8.61) 
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If  we  denote  A~~l  by  C,  then 

C  =  A~i  =  (X'X)-^  (8.62) 

and 

B  =  A-1*?  -  (JTJr)-1*?  =  CG.  (8.63) 

The  Abbreviated  Doolittle  Method  will  now  be  used  to  obtain: 

(1)  the  b's, 

(2)  the  sums  of  squares  associated  with  the  sequential  fitting  of 
the  &?s, 

(3)  the  estimated  variances  of  the  &'s, 

(4)  the  estimated  co variances  between  pairs  of  &'s,  and 

(5)  the  elements  of  the  inverse  of  the    X' X  matrix,  that  is,  the 
elements  of  C. 

The  mechanics  of  the  forward  solution  using  the  Abbreviated  Doo 
little  Method  are  summarized  in  Table  8.8.  A  discussion  of  the  steps 
involved  is  best  given  in  two  parts,  one  associated  with  the  first  section 
of  the  table  and  one  associated  with  all  the  succeeding  sections: 

First  Section  [Rows  (0)  Through  (4)1 

(1)  In  the  front  half  of  the  table  are  entered  the  elements  of  the 
matrix  of  coefficients  defined  by  Xr  X}  omitting  those  obvious 
from  symmetry.  That  is,  we  have  entered  a#  — ^JEVXTy  recog 
nizing  that  an  =  a,y . 

(2)  In  the  column  headed  "constant  terms"  are  entered  the  ele 
ments  of  the  vector  Xf  Y.  That  is,  we  have  entered  gt=^L,XiY. 

(3)  In  the  back  half  of  the  table  are  entered  the  elements  of 
the    identity    matrix,    again    omitting    those    obvious    from 
symmetry. 

(4)  In  the  check  column  are  entered  the  sums  of  all  entries  in  the 
corresponding  rows,   including  those  elements  omitted  because 
of  symmetry. 

Succeeding  Sections  [Rows  (5)  Through  (14)] 

(1)  Each  entry  in  a  given  row  is  generated  according  to  the  in 
struction  specified  for  that  row.  (See  the  first  column  of  the 
table.)  This  applies  to  the  front  half,  the  constant  terms,  the 
back  half,  and  the  check  column, 

(2)  The  sum  of  all  the  entries  in  a  row  (with  the  exception  of  the 
entry  in  the  check  column)   should  equal   (within  rounding 
error)  the  entry  in  the  check  column.  The  advantage  of  the 
checking  procedure  should  be  obvious:  If  an  arithmetic  error 
has  been  made,  it  will  be  corrected  before  calculations  are 
started  on  the  next  row. 

(3)  Steps  such  as  those  described  are  continued  until  a  row  is 
reached  in  which  only  a  single  Bpa  appears.  With  the  calcula- 
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tion  of  all   entries  in  this   row  and  the   satisfaction   of  the 
"check,"  the  forward  solution  is  complete. 

The  next  step  in  the  analysis  is  the  completion  of  the  backward 
soUition.  This  will  be  performed  in  two  parts,  one  to  determine  the  b's 
and  the  other  to  determine  the  c^  values. 

Determination  of  the  b's 

The  forward  solution  of  the  Abbreviated  Doolittle  Method  has  pro 
vided  us  with  the  following  set  of  equations  : 

(1)  60  +  (-BoOftx  +  0802)62  +  (£03)63  +  (£04)64  =  BQy 
(1)  &!  +  (£12)62  +  (^13)63  +  (£14)64  =  Siy 

(1)  62  +  (£23)63  +  (£24)64  =  £2,,          (8.64) 
(1)  63  +  (£34)64  =  £3*, 
(1)  64  = 


Solving  these  in  reverse  order  (hence  the  name  "backward  solution"), 
we  obtain: 

64    == 

63  = 

62  =  Biy  —  64£24  —  63£23  (8.65) 

b    =  Bl    —  J4J5i4  —  63£i3  —  62£is 


Determination  of  the  c,/ 

(1)  Since  C=  A^  =  (Xf  X)~~l  is  the  inverse  of  a  symmetric  matrix, 
it  will  be  symmetric.  This  reduces  the  number  of  calculations 
to  be  performed  since  Cji  =  ci:f. 

(2)  All  Cij  values  may  be  calculated  using  the  equation 

4L-£L  (8.66) 


Ar— 0 


in  which  some  of  the  A '  values  may  be  0  or  1  and  some  of  the 
Bf  values  may  be  0.  It  should  be  noted  that  some  of  the  c^ 
values  may  be  read  directly  from  the  forward  solution,  namely, 


£40  ^  £40 


41 


£41  =  £ 

C42  ^  £42 
£43  =  £43 
£44  =  £44 
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(3)  If  we  choose  to  ignore  the  symmetry  mentioned  in  (1)   and 
calculate  all  the  c^  independently,  a  check  can  be  made  on  the 
arithmetic  by  comparing  c^  and  c/»-. 

(4)  A  final  check  could  be  made  by  seeing  if    CA  equals   /.   It 
should.  However,  rounding  errors  may  cause  minor  discrepan 
cies. 

Having  completed  both  the  forward  and  backward  solutions  using 
the  Abbreviated  Doolittle  Method,  the  next  step  is  to  indicate  how  to 
obtain  the  analysis  of  variance  table  and  the  standard  errors  associ 
ated  with  the  various  statistics.  Using  the  formula 

&!,---,      &1-1      =       AiyBiy,  (8.67) 


S.S.  due  to  bi\ 


the  various  sums  of  squares  may  be  evaluated  easily  once  the  forward 
solution  of  the  Abbreviated  Doolittle  Method  has  been  completed. 
The  analysis  of  variance  may  then  be  recorded  as  in  Table  8.9.  If  we 
do  not  wish  to  record  the  reduction  in  the  residual  sum  of  squares  as 
sociated  with  the  sequential  fitting  of  each  additional  b,  it  is  proper  to 
note  that  the 


S.S,  due  to  regression  = 


(8.68) 


i—l 


and  this  pooled  sum  of  squares  possesses  4  degrees  of  freedom,  (NOTE: 
The  sum  of  squares  due  to  60  is  still  recorded  separately  since  it 
is  actually  CEltY^^/n,  that  is,  it  is  the  correction  for  the  moan.)  The 
estimated  variances  and  covariances  associated  with  the  regression 


TABLE  8.9-Analysis  of  Variance  Associated  With  the  Multiple  Regression 

Problem  Discussed  in  Table  8.8 


Source 

of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean  Square 

Due  to  &o 

t 

./I  {>»/  lJ  (Vy 

A.  Qi/^Ot/ 

Due  to  hi 

60  

1 

,/l  \y  t^ly 

A  ly^ii/ 

Due  to  b% 

60,  b\  ..».,,.. 

1 

"**  SV"^2l,/ 

A  ty&ty 

Due  to  b% 

1  &0>    &ly    62  

1 

***  3i/jCf  3^» 

A^  3t/*^3j/ 

Due  to  64 
Resklual  . 

&0,   &1,    6'2,    &3-  -  - 

1 
w  —  5 

AtyBtu 

Y\  (F—  £V 

A  4j/^4|/ 

^  ^   y-  (K—  l^Wte  —  5) 

Total 

n 

V  F2 

8.15       THE    ABBREVIATED    DOOLITTLE    METHOD  183 

coefficients  are  given  by 

sl*  =  c^s  (8.69) 

and 


From  these,  we  may  obtain 

4  =  (JTCJr)4  (8.71) 

which  must  be  evaluated  at  the  particular  set  of  .XT- values  for  which  an 
estimate  is  desired, 

Example  8.5 

Consider  the  data  given  in  Table  8.10.  The  Abbreviated  Doolittle 
Method  is  applied  to  these  data  in  Table  8.11,  where  the  X's  and  Y 
have  been  coded  so  that  the  elements  of  XfX  and  X'Y  are  approxi 
mately  of  the  same  order  of  magnitude.  (This  is  done  to  facilitate  the 
computations.)  In  this  case,  the  coding  is  as  follows:  divide  Xo  by  10, 
X^  by  100,  X2  by  10,  Xz  by  1000,  X4  by  1000,  and  Y  by  100.  Then 
Equation  (8.65)  is  used  to  obtain  JVlOO  =  —  0.681468(X0/10) 
+  0.227227(Jf  i/100)  +  0.055349(X2/10)  —  1 ,495563(X3/1000)  + 1 .546520 
(AV1000)  and  Equation  (8.66)  yields 

2052.64  —135.410     —53.6729     —538.760       2.25605  ~ 
135.410       20.0032          0.273033        22.3301        0.549940 

53.6729       0.273033        2.73843          18.3161  -0.928040 

538.760       22.3301          18.3161          171,128  -11.6657 
2.25605       0.549940   -0.928040   -11.6657        8.32196  . 

where  the  elements  of  C  reflect  the  coding  explained  above.  The  analysis 
of  variance  of  the  coded  data  is  presented  in  Table  8.12. 

Example  8.6 

The  example  of  Section  8.9  is  reworked  in  Table  8.13  using  the  Ab 
breviated  Doolittle  Method.  Again  we  get  !F  =  3.962  +  7.47S-XV  Also, 
coo  ^  0.3460,  coi  =  —0.0384,  and  cn  =  0.0055.  It  can  be  verified  that  the 
sums  of  squares  agree  (within  rounding  error)  with  the  values  reported 
in  Table  8.4. 

Although  the  Abbreviated  Doolittle  Method  was  explained  with 
reference  to  Equation  (8.57),  it  should  be  noted  that  it  applies  equally 
well  to  any  situation  where  the  equation  is  linear  in  the  unknown  coeffi 
cients.  For  example,  the  following  are  typical  of  cases  frequently  en 
countered  for  which  the  technique  will  prove  useful : 

(1)  n  -  /So  +  PiXi  +  •  •  •  +  faXk, 

(2)  77  =  /30  +  /3i^i  +  PnXl  +  /3mXr,   and 

(3)  77  =  /?o  +  /SxXi  +  faX2  +  faiXl  +  (3^x1  +  ffuXiX*. 

Some  of  these  will  be  considered  in  the  sections  and  chapters  to  follow. 


TABLE  8.10~Crude  Oil  Properties  and  Actual  Gasoline  Yields 


Crude  Oil 
Gravity, 
°API 

Crude  Oil 
Vapor 
Pressure, 
PSIA 

Crude  Oil 
ASTM 
10%  Point, 
°F. 

Gasoline 
End 
Point, 
°F. 

Gasoline 
Yield 
Per  cent  of 
Crude  Oil 

Xi 

X* 

X3 

A"* 

Y 

38.4 

6.1 

220 

235 

6.9 

40,3 

4.8 

231 

307 

14.4 

40.0 

6.1 

217 

212 

7.4 

31,8 

0.2 

316 

365 

8.5 

40.8 

3.5 

210 

218 

8.0 

41.3 

1.8 

267 

235 

2.8 

38.1 

1.2 

274 

285 

5.0 

50.8 

8.6 

190 

205 

12.2 

32.2 

5.2 

236 

267 

10.0 

38.4 

6.1 

220 

300 

15.2 

40.3 

4.8 

231 

367 

26.8 

32.2 

2,4 

284 

351 

14.0 

31.8 

0.2 

316 

379 

14.7 

41.3 

1.8 

267 

275 

6.4 

38.1 

1.2 

274 

365 

17.6 

50.8 

8.6 

190 

275 

22.3 

32.2 

5,2 

236 

360 

24.8 

38.4 

6.1 

220 

365 

26.0 

40.3 

4,8 

231 

395 

34  .  9 

40.0 

6.1 

217 

272 

18.2 

32.2 

2,4 

284 

424 

23.2 

31.8 

0.2 

316 

428 

18,0 

40.8 

3.5 

210 

273 

13.1 

41.3 

1.8 

267 

358 

16.1 

38.1 

1.2 

274 

444 

32.1 

50.8 

8.6 

190 

345 

34.7 

32.2 

5.2 

236 

402 

31.7 

38.4 

6.1 

220 

410 

33  .  6 

40.0 

6.1 

217 

340 

30.4 

40.8 

3.5 

210 

347 

26.6 

41.3 

1,8 

267 

416 

27.  B 

50.8 

8.6 

190 

407 

45.7 

Source:  Nilon  H.  Prater,  "Estimate  Gasoline  Yields  from  Crudes/'  Petroleum  Refiner 
(now  Hydrocarbon  Processing  and  Petroleum  Refiner) ,  Vol.  35,  No.  5,  pp.  236—38,  May, 
1956.  Copyright  1956,  Gulf  Publishing  Co.,  Houston,  Texas.  By  permission  of  the  author 
and  publishers. 
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TABLE  8.12~Analysis  of  Variance  Associated  With 
the  Regression  Analysis  of  the  Data  in  Table  8.10 


Source 

of  Variation 

Degrees  of 
Freedom 

Sum  of  Squares 

Mean  Square 

Due  to  60 

1 

1.236772 

1.236772 

Due  to  61 

60        

1 

0.021625 

0.021625 

Due  to  Z>u 

60,  &i 

1 

0.030985 

0.030985 

Due  to  63 

&GJ  b\j  bz    

1 

0.002921 

0.002921 

Due  to  64 

&o,  &i7  &•>,  63.  .  •  . 

1 

0.287399 

0.287399 

Residual  . 

27 

0.013477 

0  .  000499 

Total 

32 

1  .593179 

TABLE  8.13-Solution  of  the  Example  of  Section  8,9 
by  the  Abbreviated  Doolittle  Method 


Front 

Half 

Row 

60 

61 

Terms 

Back 

Half 

Check 

(0)  

13 

91 

732 

1 

0 

837 

(1)    

819 

6485 

1 

7396 

(2)  

13 

91 

732 

1 

0 

837 

(3)  

1 

7 

56.3077 

0.0769 

0 

64.3846 

(4)  
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1360.9993 

—  6.99790 

1 

1537,0014 

(5)  

1 

7.4780 

—  0.03845 

0  .  0054945 

8.44506 

8.16      SOME  ADDITIONAL    REMARKS  WITH    REGARD   TO 
GENERALIZED    REGRESSION   ANALYSES 

There  are  a  few  additional  items  which  merit  discxission  at  this 
time,  for  they  have  a  great  deal  to  do  with  the  analysis  of  any  sot  of 
data  when  the  regression  technique  is  employed. 

The  first  item  to  be  discussed  is  notation.  By  now  the  reader  may 
have  been  woiidering  what  the  significance  is  of  such  symbolism  as 
61(60  ttnd  62(60,  61.  This  notation  is  closely  allied  with  the  "conditional" 
concept  in  probability.  In  fact,  the  notation  is  \ised  in  exactly  the 
same  manner,  that  is,  to  indicate  a  condition  or  restriction.  In  the 
present  context,  the  notation  calls  attention  to  the  fact  that  sxims  of 
squares  associated  with  various  coefficients  are  obtained  in  a  definite 
(seqxieutial)  order.  Jn  particular,  the  sum  of  squares  due  to  60  is  found 
first  and  is  the  same  as  finding  the  correction  for  the  mean.  (NOTE:  60 
itself  is  not  equal  to  the  mean;  it  also  depends  on  the  nature  of  the 
mathematical  model  used  to  represent  the  data.)  After  finding  the  sum 
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of  squares  due  to  &0,  we  find  the  sum  of  squares  due  to  61 ;  hence  the 
symbolism  &i|  &0  which  is  read,  "&i  given  that  60  has  been  determined. " 
Referring  to  Table  8.9,  the  next  sum  of  squares  recorded  was  that 
"due  to  62 1  &o,  &i"  which  implies  that  62  was  found  after  &0  and  61  had 
been  determined.  The  remaining  symbols  in  Table  8.9  may  be  explained 
in  a  similar  manner. 

The  next  item  to  be  discussed  is  that  of  testing  various  hypotheses 
associated  with  the  regression  function.  Each  hi  may  be  used  to  test 
the  hypothesis  H:(3i  =  Q.  This  is  accomplished  by  computing 

t  =  bi/ssVc^  (8.72) 

with  v  =  n  —  q  degrees  of  freedom.  In  Table  8.9,  g  =  5,  and  thus 
n  —  q  =  n  —  5.  An  equivalent  test  procedure  is  provided  by 

2 

F  —  (&s/£^-)/(residual  mean  square).  (8.73) 

It  should  be  noted  that  the  various  J^-tests  defined  by  Equation  (8.73) 
are  not  all  independent  since  the  X  variables  are  correlated.  However, 
the  tests  provide  useful  information  if  interpreted  wisely.  Each  of  the 
mean  squares  reported  in  Table  8.9  may  also  be  used  to  form  J^-ratios 
which  provide  additional  important  information.  These  /^-ratios, 
defined  by 

mean  square  due  to  b*  \  bo,  &i,  -  -  -  ,  bi-i 

p  —  ,  ^g  m  74) 

residual  mean  square 

will  assess  the  significance  of  the  additional  reductions  in  the  residual 
sum  of  squares  achieved  by  fitting  the  b's  in  the  particular  order  adopted 
by  the  analyst.  The  italicized  words  emphasize  an  important  point :  The 
order  of  fitting  the  coefficients  has  a  decided  effect  on  the  analysis.  As 
we  shall  see  later,  if  the  variables  -X"i,  X2,  etc.,  represent  successive 
powers  of  a  single  variable  X  (i.e.,  the  mathematical  model  is  a  poly 
nomial  in  -XT),  then  a  natural  order  is  available.  In  other  cases,  the 
order  is  a  result  of  a  decision  on  the  part  of  the  analyst  to  write  the 
terms  of  the  model  in  a  specific  order  when  setting  up  the  Doolittle 
solution.  One  may,  of  course,  make  use  of 


S.S.  due  to  regression  =    s,  AivB^  (8.75) 

and  calculate 

mean  square  due  to  regression 

p  —  , — . ~ 

residual  mean  square 

^  ,  _  ,  (8-76) 
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in  order  to  assess  the  over-all  significance  of  fitting  the  regression 
equation.  Other  tests  which  aid  in  determining  the  order  of  fitting  and 
the  choice  of  variables  to  be  retained  in  the  regression  equation  are 
available.  However,  discussion  of  these  is  not  warranted  in  this  book. 
Instead,  the  reader  is  referred  to  other  sources,  such  as  Hader  and 
Grandage  (10),  for  the  pertinent  details. 

While  not  discussed  at  this  time,  it  should  be  clear  to  the  reader  that 
confidence  interval  estimates  may  be  obtained  through  xise  of  tech 
niques  discussed  in  Chapter  6,  The  appropriate  standard  errors  are 
defined  in  Section  8.15.  For  further  details  the  reader  is  again  referred 
to  Hader  and  Grandage  (10). 

8-17      TESTS    FOR    LACK  OF   FIT 

In  Section  8.11  the  assumption  was  made  that  the  failure  of  the 
model  to  fit  the  observations  exactly  was  solely  a  function  of  the 
errors.  This  assumption  is  seldom  true,  although  it  may  be  nearly  so 
in  many  cases.  To  check  on  its  validity,  one  must  have  some  measure 
of  error  other  than  that  provided  by  the  residual  sum  of  squares*  The 
only  way  to  obtain  such  a  measure  is  to  insist  that  the  experiment  be 
repeated  some  number  of  times  at  at  least  one  value  of  the  independent 
(or  controlled)  variable.  In  addition,  it  is  also  wise  to  insist  on  running 
the  experiment  at  as  many  different  values  of  the  controlled  variable 
as  is  feasible.  In  the  example  considered  in  Section  8.9,  the  latter 
recommendation  was  followed  but  no  repetition  of  the  experiment  at 
any  value  of  the  controlled  variable  was  undertaken.  This  enabled  us 
to  make  a  visual  judgment  about  lack  of  fit  but  no  statistical  analysis 
was  possible. 

To  examine  the  statistical  test  for  lack  of  fit,  consider  the  data 
reported  by  Hunter  (11).  These  data  are  reproduced  in  Table  8.14, 
Introducing  the  dummy  variable,  -ST0,  which  is  identically  equal  to  1, 
and  using  the  Abbreviated  Doolittle  Method,  the  simple  linear  re 
gression  equation  is  determined  to  be  ^=1.76+2.86^3.  The  associ 
ated  analysis  of  variance  is  given  in  Table  8.15. 

In  this  example,  however,  it  is  possible  to  subdivide  the  residual 
sum  of  squares  into  two  parts:  one  part  being  an  estimate  of  experi 
mental  error  and  the  other  a  measure  of  the  lack  of  fit  of  the  linear 
model.  The  reason  such  a  subdivision  is  possible  should  be  clear:  The 
researcher  was  careful  to  provide  some  replication  in  the  performance 
of  the  experiment.  To  actually  perform  this  subdivision  of  the  residual 
sum  of  squares,  it  is  easier  to  calculate  the  experimental  error  sum  of 
squares  and  then  obtain  the  lack  of  fit  sum  of  squares  by  s\ibtraction. 
The  experimental  error  sum  of  squares  is  foxand  by  pooling  the  sums  of 
squares  of  deviations  about  the  mean  for  each  value  of  the  independent 
variable,  that  is, 

Z    (Z  Y*~  (Z  YY/n]  (8.77) 

all  X 
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TABLE  8. 14-Percentage  of  Impurities  at  Different  Temperatures 


Coded  Temperature 

Per  Cent  of  Impurities 

Temperature  (°C.) 

-STi 

Y 

200  

1 

6.4 

200  

1 

5.6 

200  

1 

6.0 

210  

2 

7.5 

210  

2 

6.5 

220  

3 

8,3 

220  

3 

7.7 

230  

4 

11.7 

230  

4 

10.3 

240  

5 

17.6 

240  

5 

18.0 

240  

5 

18.4 

Reprinted  from:  J.  S.  Hunter,  "Determination  of  Optimum  Operating  Conditions  by 
Experimental  Methods,  Part  II- 1,  Models  and  Methods/'  Industrial  Quality  Control,  Vol. 
15,  No.  6,  pp.  16-24,  Dec.,  1958.  By  permission  of  the  author  and  editor. 

TABLE  8.15-First  Analysis  of  Variance  for  the  Data  of  Table  8.14 


Source  of  Variation 

Degrees 
of  Freedom 

Sum  of  Squares 

Mean  Square 

Due  to  60  

1 

1281.3333 

1281.3333 

Due  to  61  1  60  

1 

228.5715 

228.5715 

Residual  

10 

40  .  3952 

4.0395 

Total 

12 

1550.3000 

where  the  expression  within  the  braces  is  calculated  separately  for  each 
value  of  -X".  In  the  example, 
Experimental  Error  Sum  of  Squares  = 

{(6.4)2  +  (5.6)2  +  (6.0)*  -  (6.4  +  5.6  +  6.0)2/3} 
+  |(7,5)2  +  (6.5)2  ~  (7.5  + 6.5)  2/2} 
+  {  (8.3)*  +  (7.7)2  _  (8.3  +  7.7)2/2} 
+  {  (11. 7)2  +  (10.3)2  —  (11.7  +  10.3)  V2} 
H-  {  (17.6)2  H-  (18.0)2  +  (18.4)2  -  (17.6  +  18.0  +  18.4)2/3}  =  2.3000. 

Thus,  it  is  now  possible  to  record  the  results  as  in  Table  8.16.  In  this 
table,  the  experimental  error  mean  square  is  a  pooled  estimate  of  o*B 
that  is  uncontaminated  by  any  inadequacy  of  the  assumed  linear 
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TABLE  8.16-Second  Analysis  of  Variance  for  the  Data  of  Table  8.14 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

F-Ratio 

Due  to  &o  

1 

1281.3333 

1281.3333 

Due  to  61  1  &o  

1 

228.5715 

228.5715 

Lack  of  fit  . 

3 

38.0952 

12.6984 

38.64** 

Experimental  error.  .  .  . 

7 

2.3000 

0.3286 

Total 

12 

1550.3000 

**  Significant  at  the  1  per  cent  level. 

model.  To  test  the  hypothesis  that  the  linear  model  is  appropriate  (i.e., 
no  real  lack  of  fit  exists),  it  is  legitimate  to  obtain  the  ratio  F=*  12.6984 
/0.3286  =  38.64  with  ^i==3  and  *>2  =  7  degrees  of  freedom.  Since  this 
exceeds  ^.99(3,7)  =8.45,  the  decision  is  reached  that  a  serious  lack  of  fit 
exists.  That  is,  the  hypothesis  of  no  lack  of  fit  is  rejected.  Another  way 
of  stating  this  conclusion  is  as  follows:  The  assumed  linear  model  in 
adequately  describes  the  data. 

8.18      NONLINEAR   MODELS 

If  the  /''-ratio  in  Table  8.16  had  turned  out  to  be  nonsignificant,  it 
would  have  been  concluded  that  the  linear  fit  was  probably  adequate. 
However,  since  the  linear  model  was  judged  to  be  inadequate,  the  re 
searcher  is  obliged  to  consider  fitting  some  nonlinear  model.  That  is,  he 
should  attempt  to  discover  a  different  mathematical  model  which  bet 
ter  describes  (or  represents)  the  observations. 

There  are  many  alternatives  to  be  considered  at  this  stage.  For 
example,  should  a  higher  degree  polynomial  be  investigated  or  should 
some  other  functional  form  be  considered?  Pox-haps  some  exponential 
function  might  be  the  appropriate  model  for  the  problem  under  investi 
gation.  A  few  mathematical  models,  other  than  polynomials,  which  are 
frequently  encountered  in  applied  work  are 


77  =«  o>{3 ; 

In  97  »  In-y  +  (lnc*)/3-T; 

1/77  -  T  +  <*/3x; 

77  - 


oj  >  0,  ft  >  0  (8,78) 

a  >  0,  ft  >  0,   y  >  0            (8.79) 

a  >  0,  £  >  0,  r  >  0            (8 . 80) 

0  >  0,  y  >  0.  (8.81) 

These  are  xisually  referred  to  as:  (1)  the  simple  exponential  or  com 
pound  interest  function,  (2)  the  Gompertz  function,  (3)  the  logistic 
function,  and  (4)  the  Mitscherlich  function,  respectively. 

The  selection  of  an  alternative  to  the  linear  model,  i.e.  to  the  first 
degree  polynomial,  is  not 'easy.  The  choice  should  be  made  only  after 
careful  consideration  of  the  basic  mechanism  of  the  system.  If  this  is 
not  feasible,  scatter  plots  should  be  examined.  When  it  is  evident  that 
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some  degree  of  curvature  is  present  in  the  data  but  no  clear-cut  choice 
of  mathematical  model  is  possible,  a  reasonable  approach  is  to  syste 
matically  examine  polynomials  of  increasing  order  (i.e.,  of  higher 
degree) . 

8.19     SECOND  ORDER   MODELS 

Since  the  first  order  model  (a  straight  line)  was  an  inadequate  repre 
sentation  of  the  data  in  Table  8.14,  it  is  now  proposed  that  a  second 
order  (quadratic)  model  be  investigated.  That  is,  the  model 


f) /^  ~y~    \  /•?  ~y"  —i-  /•?    Y~  f  Q   fto^ 

in  which  X 0  is  identically  1,  will  be  considered.  Writing  the  data  in 
matrix  form,  namely, 


F  = 


6.4' 
5.6 
6.0 
7.5 
6.5 
8.3 
7.7 
11.7 
10.3 
17.6 
18.0 
L18.4 


1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 


1 

1 

1 

1 

1 

1 

2 

4 

2 

4 

3 

9 

3 

9 

4 

16 

4 

16 

5 

25 

5 

25 

5 

25  _ 

and  using  the  Abbreviated   Doolittle  Method   (see  Table  8.17),   the 
regression  equation  is  found  to  be 

1t  ==  8.43  —  3.14-XTi  +  l.OOXi.  (8.83) 

The  accompanying  analysis  of  variance  is  given  in  Table  8.18. 

A  comparison  of  Tables  8.16  and  8.18  indicates:  (1)  Fitting  a  quad 
ratic  term  led  to  a  significant  reduction  in  the  lack  of  fit  sum  of  squares, 
and  (2)  there  is  still  a  significant  lack  of  fit.  In  other  words,  although 
the  quadratic  equation  was  a  better  fit  than  the  linear  equation,  the 
second  degree  polynomial  does  not  adequately  describe  the  data. 
What,  then,  should  be  the  next  step?  Should  higher  degree  polynomials 
be  investigated  or  should  attention  be  directed  toward  some  other 
functional  form?  As  indicated  in  Section  8.18,  the  answer  to  such  a 
question  is  not  easily  obtained.  In  fact,  a  "best'7  answer  may  not  exist. 
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TABLE   8.17-The  Fitting  of  F  =  60+&i^i+Z>u^i  to  the  Data  of  Table  8.14 
by  the  Abbreviated  Doolittle  Method 


Front 

Half 

Row 

bo          bi 

611 

Terms 

Back  Half 

Check 

(0)  

12          36 

136 

124 

1 

0 

0 

309 

(1)  .  .     . 

136 

576 

452 

1 

o 

1201 

(2>  :  .  :  :  : 

2584 

1920 

1 

5217 

(3)  

12          36 

136 

124 

1 

0 

0 

309 

(4)  

1            3 

11    333333 

10.333333 

0.083333 

o 

o 

25.75 

(5)  

28 

168   000012 

80.000012 

—2.999988 

1 

o 

274 

(6)     . 

1 

6 

2   857143 

—0.  107142 

0.035714 

o 

9.785714 

(7)  .... 

34  666640 

34  666654 

6,666569 

—  5.999952 

1 

70.999931 

(8) 

1 

1 

0   192305 

—0   173076 

0  028846 

2.048077 

TABLE   8.18-Third  Analysis  of  Variance  for  the  Data  of  Table  8.14 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

jF-Ratio 

Due  to  &o 

1 

1281   3333 

1281   3333 

Due  to  bi  5o  

1 

228.5715 

228.5715 

695.59** 

Due  to  bn   60,  b\  

1 

34.6667 

34.6667 

105.50** 

Lack  of  fit  

2 

3.4285 

1.7142 

5.22* 

Experimental  error  .... 

7 

2  .  3000 

0.3286 

Total 

12 

1550.3000 

Significant  at  the  5  per  cent  level. 
'*  Significant  at  the  1  per  cent  level. 


Thus,  rather  than  say  what  should  be  done,  it  seems  politic  to  suggest 
a  procedure  which  can  be  modified  at  the  discretion  of  the  analyst.  The 
suggested  procedure  is:  Rather  than  seek  a  better  fit  in  terms  of  a 
higher  degree  polynomial  (i.e.,  a  degree  greater  than  2),  it  is  probably 
wiser  to  cast  about  for  some  other  functional  form  to  represent  the 
data.  In  the  case  under  consideration  (i.e.,  the  data  of  Table  8.14),  an 
examination  of  a  scattergram  would  suggest  that  an  exponential  func 
tion  be  given  serious  consideration.  The  suggested  procedure  does  not, 
of  course,  preclude  the  fitting  of  higher  degree  polynomials  if  that  ap 
pears  to  be  the  proper  approach.  For  example,  if  we  consider  the  data 
analysed  in  Section  8.9,  a  third-degree  polynomial  will  give  an  excel 
lent  fit*  Of  course,  some  form  of  exponential,  perhaps  of  the  logistic 
variety,  might  also  be  appropriate. 

8-20     ORTHOGONAL   POLYNOMIALS 

When  the  values  of  X  (the  independent  variable)  are  equally  spaced, 
there  is  another  method  of  fitting  polynomial  regression  equations 
which  has  much  to  recommend  it.  This  is  the  method  of  orthogonal 
polynomials.  You  will  have  noticed,  when  fitting  polynomials  by  the 
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method  described  earlier,  that  each  time  it  was  necessary  to  start  the 
solution  from  the  beginning  and  solve  a  new  set  of  normal  equations. 
For  example,  when  fitting  a  second  order  model  we  were  unable  to  use 
the  results  of  fitting  the  first  order  model.  However,  when  the  data 
permit  the  use  of  orthogonal  polynomial  techniques,  we  can  salvage 
the  previous  results  and  simply  perform  the  calculations  required  to 
add  a  new  term  to  the  polynomial  (of  one  less  degree)  determined  at  the 
preceding  stage. 

The  method  of  orthogonal  polynomials  will  be  illustrated  only  for 
the  case  where  the  values  of  X  are  equally  spaced  at  unit  intervals  and 
where  each  X  has  but  one  Y  value  associated  with  it.  If  the  X  values 
are  equally  spaced  at  intervals  not  equal  to  unity,  we  may  code  the  X 
values  by  dividing  through  by  the  length  of  the  common  interval  and 
then  proceed  in  the  manner  to  be  developed  below.  If  there  is  more  than 
one  Y  value  associated  with  each  X,  the  method  is  not  applicable  un 
less  we  have  an  equal  number  of  Y  values  associated  with  each  X  value* 
In  the  latter  case,  the  complete  solution  may  be  obtained  by  introduc 
ing  the  proper  divisor  into  the  calculations.  If  the  X  values  are  un 
equally  spaced,  a  solution  may  be  otained  [see  Kendall  (13)],  but  the 
operation  is  so  cumbersome  that  it  will  not  be  presented  in  this  text. 
Thus,  in  all  but  the  simpler  cases,  it  is  usually  better  for  the  research 
worker  to  use  the  method  described  earlier  in  this  chapter.  However,  if 
the  experimental  data  are  amenable  to  simple  treatment  by  the  method 
of  orthogonal  polynomials,  the  research  worker  is  advised  to  use  that 
method,  for  it  saves  time  and  also  allows  him  to  calculate  and  evaluate 
readily,  step  by  step,  the  contribution  made  by  fitting  each  additional 
term  in  the  regression  function. 

What,  then,  is  the  ^method  of  orthogonal  polynomials?  It  may  be 
shown  that  any  polynomial,  for  example, 

Y  =  60  +  biX  +  -  -  -  +  bkXk  (8.84) 

may  be  rewritten  as 

?  -  A,  +  AI&+  •  -  •  +  A*&  (8.85) 

in  which  the  f£  (z  =  l,  -•-,&)  are  orthogonal  polynomials5  and  the 
A.i  (i  =  0,  •  •  -  ,  k)  are  constants  defined  by 

Ao  =  $2  Y/n  -    Y  (8.86) 

and 

£' 

t'\*  '  *«  1,  ••-,*•  (8-87) 

siv 

For  the  case  we  are  considering,  that  is,  where  X  takes  on  the  values 

*  Two  polynomials  are  said  to  be  orthogonal  if,  when  X  takes  on  a  specified  set 
of  values,  S&-£jb«aO  for  i^k,  where  the  summation  means  that  we  first  compute 
the  product  k'ikL  f°r  each  value  of  X  and  then  obtain  the  sum  of  these  products. 
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,  n,  the  first  three  orthogonal  polynomials  may  be  expressed  as 
~  Z),  (8.88) 


-  (x  - 


7n 

J, 


(8.89) 


(8.90) 


where  the  X*  are  constants  (depending  on  ri)  chosen  so  that  the  values 
of  the  £"s  are  integers  reduced  to  their  lowest  terms.  An  abbreviated 
table  of  £  values  is  given  in  Table  8.19;  a  more  complete  table  may  be 
found  in  Anderson  and  Houseman  (1), 

TABLE  8.19-Partial  Table  of  %  Values 


Degree  of  Polynomial 


A-l 

&  =5  2 

£  =  3 

&«4 

n 

« 

€!      & 

«        3        fc' 

tf 

£2        la 

« 

i  

_! 

—  i     +1 

—  3      +1      —1 

—  2 

+  2      —1 

+  1 

2  

+  1 

0      —  2 

—1      —  1      +3 

—  1      +2 

—  4 

3  

-fl      +1 

0 

—  2          0 

4   .  . 

+3      +1      +1 

1           L  _  Q 

"^^   JL                 ~  ^ 

^ 

5 

+  2 

+  2     4-1 

+  1 

Before  considering  a  numerical  example,  the  equations  necessary  for 
calculation  of  the  various  sums  of  squares  will  be  given.  They  are  as 
follows  : 

(8.91) 

r~ 

(8.92) 


S.S,  due  to  bQ 

S.S,  due  to  fitting  the  ith  degree  term  =====  .4t-(5D 


Example  8.7 

Consider  the  data  of  Table  8.3  and  rewrite  the.  values  in  the  form 
shown  in  Table  8.20.  Equations  (8. 80)  and  (8.87)  then  yield  f^  — 56.3 
+7.478£i  —  .0365&  —  .6801&.  If  Equations  (8.88)  through  (8.90)  are 
evaluated  as  far  as  possible  by  using  the  known  values  of  n,  3Tand  the  X's, 
and  then  substituted  in  the  regression  equation  just  found,  we  obtain 
f  =  21.7277  —  5.8458^+2.3449X^  —  0. 1134AX  The  reduction  in  sum  of 
squares  due  to  fitting  the  various  terms  in  the  regression  function  may, 
of  course,  be  calculated  using  Equations  (8.91)  and  (8.92). 

8.21      SIMPLE   EXPONENTIAL   REGRESSION 

The  regression  function  specified  by  Equation  (8.78)  is  frequently 
encountered  in  experimental  work*  Thus,  it  is  appropriate  that  some 
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Table  8.20-Table  for  Calculating  the  A's  and  Corresponding 

Suras  of  Squares 
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F 

5i 

8 

« 

ni 

F« 

F« 

17  

—  6 

22 

—  11 

—  102 

374 

—  187 

21  

—  5 

11 

o 

—  105 

231 

o 

22  

—  4 

2 

6 

—  88 

44 

132 

27  

—  3 

8 

—  81 

—  135 

216 

36.  .. 

—  2 

—  10 

7 

—  72 

—  360 

252 

49  . 

H 

—  13 

4 

—  49 

—  637 

196 

56  

o 

—  14 

o 

0 

—  784 

0 

64  

1 

—  13 

—  4 

64 

—  832 

—  256 

80.  .. 

2 

—  10 

—  7 

160 

—  800 

—  560 

86  

3 

—  5 

—  8 

258 

—  430 

—  688 

88  

4 

2 

—  6 

352 

176 

—  528 

92  

5 

11 

0 

460 

1012 

0 

94  

6 

22 

11 

564 

2068 

1034 

732.  .  . 

0 

0 

o 

1361 

—  73 

—  389 

X  

1 

1 

1/6 

S  (£02-- 

182 

2002 

572 

discussion  of  the  associated  methods  of  analysis  be  given.  If  the  method 
of  least  squares  is  used,  the  resulting  normal  equations  are  not  amen 
able  to  easy  solution.  Consequently,  some  other  (approximate)  ap 
proach  is  necessary.  The  usual  approximate  procedure  adopted  is  to 
take  logarithms  (logarithms  to  the  base  10  are  most  convenient)  which 
results  in 


log  17  =  log  a  +  (log  ff)X. 

Redesignating  the  quantities  as  f olio ws :  Z  =  log 
and  W  —  X,  Equation  (8.93)  appears  as 

Z  =  A  +  BW 


j,  A  ==log  <x, 


(8.93) 
'  =  log/?, 

(8.94) 


and  we  immediately  recognize  this  as  being  of  the  sameJForm  as  Equa 
tion  (8.12).  Estimates  of  A  and  B,  denoted  by  A  and  B,  may  then  be 
found  following  the  methods  described  earlier.  This  solution,  which  is 
equivalent  to  fitting  a  straight  line  by  least  squares  to  the  data  when 
plotted  on  semi-logarithmic  paper,  is  not  identical  with  a  least  squares 
solution  of  the  original  problem  using  Equation  (8.78)  and  ordinary 
graph  paper.  However,  the  approximation  is  sufficiently  accurate  for 
most  problems. 


Example  8.8 

Consider  the  data  of  Table  8.21.  Using  either  the  method  of  Section 
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8.7    or    the    Abbreviated    Doolittle    Method,    it    is    determined    that 
A  =0.9469  and  J3  =  0.002576. 

TABLE  8.21-Protein.  Content  and  Proportion  of  Vitreous  Kernels  in 

Samples  of  Wheat 


Sample  Number 

Proportion  of 
Vitreous  Kernels 
X(-WO 

Protein  Content  F 

Z  =  log  F 

1    

6 

10.3 

1.013 

2        

75 

12.2 

1,086 

3                 

87 

14.5 

1.161 

4  

55 

11.1 

1.045 

5                     ... 

34 

10.9 

1.037 

6  

98 

18.1 

1.258 

7                 

91 

14.0 

1.146 

8            

45 

10.8 

1.033 

9  

51 

11.4 

1.057 

10   

17 

11.0 

1.041 

11            

36 

10.2 

1.009 

12                 

97 

17.0 

1.230 

13            

74 

13.8 

1.140 

14  

24 

10.1 

1.004 

15        

85 

14.4 

1,158 

16          

96 

15.8 

1.199 

17                     .... 

92 

15.6 

1  .  193 

18  

94 

15,0 

1.176 

19   

84 

13.3 

1.124 

20   

99 

19.0 

1.279 

Reproduced  from  M.  TCzektel,  Methods  of  Correlation  Analysis  (New  York:  John  Wiley 
and  Sons,  Inc.,  1941),  p.  82.  By  permission  of  the  author  and  publishers. 

8-22     THE   SPECIAL  CASE:77=/3X 

In  some  instances,  it  is  reasonable  to  assume  that  the  true  regression 
line  passes  through  the  origin.  That  IB,  if  simple  linear  regression  is  be 
ing  considered,  /5o  in 

&    __L_  o  v  f8  OS"i 

^     =     PQ    •+•    plJ\  V.O.^.*V 

is  assumed  to  be  0  and  Equation  (8.95)  is  rewritten  as 

(8.96) 


where,  of  course,  |S«=£i.  It  is  clear  that  such  an  assumption,  if  justified, 
will  simplify  the  calculations!  procedures.  It  can  be  verified  that  for 
this  special  ease 

j^^ga-  ^  XY/  23  X*-  (8.97) 

Please  do  not  make  the  mistake  of  adopting  this  simpler  form  just 
because  it  is  easier  to  handle.  Further,  even  if  the  assumption  is  justi- 
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fied  (such  as  when  X  =  height  and  3^  — weight  of  men),  it  may  be  better 
to  forego  the  simplifying  assumption  and  consider  *7  =  /3o+/3i.-X"  as  being 
more  appropriate  for  the  range  of  X  values  being  studied  in  the  experi 
ment. 

In  summary,  the  mathematical  model  should  only  be  chosen  after 
proper  consideration  has  been  given  to  all  the  factors  involved. 

8,23      WEIGHTED    REGRESSIONS 

Suppose  the  data  to  be  considered  are  of  such  a  nature  that  the  as 
sumption  of  homoscedasticity  (i.e.,,  homogeneous  variances)  is  no  longer 
justified.  That  is,  suppose  we  can  not  assume  that  o^-\^  is  the  same  for 
all  X,  but  that  we  must  assume 


where  the  w^  are  known  constants.  If  -we  restrict  ourselves  to  the  case 
where  77  =  /3v+-@iX,  it  may  be  shown  that  the  resulting  normal  equa 
tions  are 


/    k  \  /    k  \  k     ni 

I    T^  n*Wi  J  £0  -f-  (    23   niW+Xi  J  &!  =   ]>3  23  Wi  Y^ 

\  i=*\  /  \  i^\  /  i««i  J--.1 

/       k  X  /       k  2\  k         n-t 

(    53  ntWiXt  )  60  +  (    23  n^WiXi    )  61  =  23  23  WiXiY+j 
\  *— i  /  \  i*=i  /  t=i  j— i 


(8.99) 


where 

F<,  =  /30  +  ^Si^i^  H-  6,v;          i  «  1,  •  —  ,  fe  (8,100) 

j  =  1,  -  •  -  ,  »f. 

It  is  of  particular  interest  to  consider  the  case  where  erf  is  propor 
tional  to  -XT;,  that  is,  where  we  may  write 

v\  =  <r2A0*  =  o-2-X"i,  (8.101) 

since  this  is  a  fairly  common  occurrence  in  certain  areas  of  experimen 
tation.  Under  these  conditions,  the  normal  equations  reduce  to 


z^l 

(»)«o+(i:  «<**)&!-  2:2: 

where 

fl    :===i     ^  ^    fl<i. 
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In  this  special  case,  a2  is  estimated  by 


-  2).  (8.  103) 

t—i  y-i 

8.24     SAMPLING  FROM  A  B1VAR1ATE  NORMAL  POPULA 
TION 

Let  us  consider,  as  far  as  practicable,  the  consequences  of  obtaining 
a  random  sample  of  values  of  both  X  and  Y  from  some  bivariate  popu 
lation  rather  than  first  choosing  values  of  X  and  then  observing  ran 
dom  F  values  associated  with  these  chosen  X  values.  What  effect  will 
such  a  procedure  have  on  our  estimates?  As  we  have  stated  the  prob 
lem,  it  is  much  too  general  to  permit  a  satisfactory  answer  in  this  book. 
However,  if  we  make  the  assumption  that  our  bivariate  population  is  a 
bivariate  normal  population,  then  we  may  examine  the  effect  of  obtain 
ing  random  pairs  of  X  and  Y  rather  than  choosing  X  values  and  then 
observing  the  random  values  of  Y  associated  with  the  selected  values 
of  X. 

In  this  case,  two  approaches  are  possible:  (1)  Obtain  the  best  regres 
sion  equation  for  estimating  a  value  of  Y  associated  with  a  specified 
value  of  -XT;  (2)  obtain  the  best  regression  equation  for  estimating  a 
value  of  X  associated  with  a  specified  value  of  Y.  That  is,  we  can  obtain 

$  =  60  +  &1-3T  (8.104) 

as  in  Section  8.7,  or  we  can  obtain 

£  =  co  +  CiF  (8.105) 

where 

c*=*°X  —  tf,T  (8.106) 

and 

ci  -  Z>3>/  52  y*.  (8.107) 

It  is  to  be  noted  that  the  above  relations  assume  no  "errors  of  meas 
urement"  in  X  and  Y.  If,  however,  our  variables  arc  subject  to  errors 
in  measurement,  so  that  we  really  observe  Z  —  X  +  *  and  W—  F+S, 
where  €  and  5  arc  independently  and  normally  distributed  with  moan  0 
and  variances  <r«  and  cr*>  respectively,  what  estimation  procedure  may 
we  use?  If,  in  the  future,  we  measure  %  and  wish  to  estimate  Y,  the 
regression  of  W  on  %  should  be  used  ;  if  we  measure  W  and  wish  to  esti 
mate  .XT,  the  regression  of  Z  on  W  should  be  calculated  and  used. 

A  reasonable  question  to  ask  at  this  point  is:  What  effects  do  the 
above-mentioned  errors  of  measurement  have  on  the  accuracy  and  pre 
cision  of  our  estimates?  Some  answers  arc: 

(1)   Tf  the  random,  errors  of  measurement  are  associated  only  with 
the  dependent  variable,  and  are  not  related  to  the  true  values, 
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they  will  not  affect  our  estimate  of  the  true  slope  but  will 
cause  s%  to  overestimate  <TE. 

(2)  If  the  random  errors  of  measurement  are  associated  only  with 
the  independent   variable,   and  are  not   related  to  the  true 
values,  they  not  only  cause  SE  to  overestimate  o-E  but  also 
tend  to  produce  underestimates  of  the  true  slope. 

(3)  If  both  variables  are  subject  to  error,  the  consequences  are  not 
so  easily  determined,  and  much  care  should  be  taken  when 
making  predictions  based  on  such  data. 

Suppose,  however,  that  we  want  to  estimate  the  true  relationship 
between  X  and  Y.  To  accomplish  this  we  need  further  information 
about  v\  and  cr28.  Such  information  (i.e.,  estimates  s«  and  sf)  can 
sometimes  be  obtained  by  making  duplicate  measurements.  However, 
such  a  procedure  is  not  always  possible.  When  duplicate  measure 
ments  are  not  available,  other  approaches  must  be  explored.  Many 
scholars  have  considered  the  problems  associated  with  regression  anal 
yses  in  which  both  variables  are  subject  to  error,  and  several  solutions 
have  been  proposed.  However,  because  no  general  optimal  solution  has 
yet  been  obtained  and  because  the  subject  may  rightly  be  considered 
to  be  beyond  the  scope  of  this  text,  no  attempt  will  be  made  to  illus 
trate  any  of  the  proposed  methods  of  analysis. 

8.25     ADJUSTED    Y  VALUES 

Closely  related  to  the  reduction  in  sum  of  squares,  mentioned  sev 
eral  times  in  this  chapter,  is  the  technique  of  adjusting  values  of  the 
dependent  variable  to  take  account  of  differences  among  the  associ 
ated  values  of  the  independent  variable.  For  example,  if  we  are  con 
cerned  with  measurements  on  the  gains  in  weights  of  certain  animals,  a 
valid  comparison  among  the  gains  does  not  seem  possible  unless  we 
adjust  for  such  a  value  as  the  initial  weight  of  the  animals  or  the  feed 
consumed.  That  is,  if  one  animal  gains  60  pounds  while  consuming  300 
pounds  of  feed,  and  another  animal  gains  40  pounds  while  consuming 
200  pounds  of  feed,  we  do  not  feel  justified  in  making  a  direct  compari 
son  between  60  pounds  and  40  pounds.  We  should  first  attempt  to 
make  some  adjustment,  or  correction,  for  the  different  amounts  of  feed 
consumed.  One  way  to  make  such  an  adjustment  is  through  regression. 
If,  from  the  present  or  other  data,  we  have  an  estimated  regression 
function  Y  =  bo-{-biX,  where  F  =  gain  in  weight  and  J£T  =  feed  con 
sumed,  we  can  adjust  the  observed  gains  in  weight  to  some  common 
value  for  feed  consumed.  The  value  of  X  most  commonly  selected  is 
the  sample  mean  (X)  but  any  value  will  do.  The  reason  why  the  mean 
is  usually  adopted  as  the  point  of  comparison  is  that,  in  general,  it  is 
near  the  center  of  the  range  of  values  of  the  independent  variable. 

What,  then,  is  the  procedure  for  determining  adjusted  Y  values? 
The  formula  defining  adjusted  Y  values  (adjusted  to  X,  that  is)  is  as 
follows : 

adj.   Y  =    Y  -  bi(X  -  X)  (8.108) 
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and  the  nature  of  the  adjustment  is  illustrated  in  Figure  8.8.  Here  only 
three  sample  points  have  been  plotted  since  these  are  sufficient  to  illus 
trate  the  technique.  It  is  seen  that  all  the  adjusted  Y  values  (repre 
sented  by  circles)  appear  on  the  line  erected  vertically  at  X  because 
we  adjusted  to  X  =  "X.  Note  that  it  is  possible  to  have  adjusted  Y\ 
>  ad  justed  F2  even  when  Fx<  F2.  Thus,  it  is  readily  apparent  that  the 
adjustment  of  a  set  of  measurements  based  on  a  concomitant  variable 
may  completely  change  the  entire  picture  of  an  experiment.  As  a  con 
sequence,  we  might  reach  much  different  conclusions  based  on  an  anal 
ysis  of  adjusted  values  than  would  have  been  reached  if  no  account 
were  taken  of  the  functional  relation  existing  between  the  dependent 
and  independent  variables. 


X 


FIG.   8.8— Illustration  of  adjusted  Y  values. 

It  should  be  evident  that  adjusted  Y  values  may  also  be  determined 
when  dealing  with  other  than  simple  linear  regression.  For  example,  if  a 
multiple  linear  regression  analysis  has  been  performed  and  the  regres 
sion  cqiiation  determined  to  be 

f>  ~  fto  +  b,Xl  +  .  .  .  +  bkxk,  (8. 109) 

then  adjusted  Y  values  are  defined  by 

adj.  F  =*   F  -  bl(Xl  -  Z*)  -  b^X*  -  Z2)  -  — 

-  b*(X*  -  3"*).  (8.110) 

Equation  (8.110)  would,  of  course,  be  evaluated  using  the  appropriate 
sample  values  (Y^  Xu,  -  -  *  ,  XM)  and  the  calculated  mean  values. 

Rather  than  dwell  on  the  topic  of  adjxasted  values  at  this  time,  we 
shall  defer  further  discussion  until  later  in  the  book  where  a  more  effi 
cient  method  of  analysis,  namely,  covariance  analysis,  will  be  intro 
duced. 
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8.26     THE  PROBLEM  OF  SEVERAL  SAMPLES  OR  GROUPS 

In  this  section,  a  topic  of  considerable  importance  will  be  discussed. 
It  is :  Given  several  samples  or  groups  of  observations,  may  all  the  data 
be  pooled  into  one  large  sample?  This  sort  of  problem  has  arisen  earlier 
in  this  book  and  it  is  not  surprising  that  it  also  arises  when  dealing 
with  regression  analyses. 

Although  the  problem  can  arise  regardless  of  the  form  of  the  regres 
sion  function,  the  discussion  here  will  be  limited  to  the  case  of  simple 
linear  regression.  If  other  functional  forms  are  pertinent,  a  statistician 
should  be  consulted. 

When  several  sets  of  sample  data  are  available,  the  question  most 
frequently  asked  is:  Can  one  regression  line  be  used  for  all  the  data? 
More  specialized  questions  are : 

(1)  Taking  liberties  with  the  system  of  notation  adhered  to  up  to 
this  point  and  letting  6t-  represent  the  estimate  of  fit,  where  6* 
is  the  estimated  slope  for  the  ith  group  and  /?»  is  the  true  slope 
of  the  regression  function  in  the  population  from  which  the 
ith  group  is  a  sample,  does /3i  ==/32  =   -  -  •  =/3k?  In  other  words, 
are  all  the  sample  slopes  estimates  of  the  same  true  slope? 

(2)  Assuming  /3i  =  /32=   •  •  •  —Pk,  would  a  regression  fitted  to  the 
group  means  be  linear? 

(3)  Assuming  /3i  =  /32=   -  -  •  —  j8*  and  that  the  regression  of  the 
group  means  is  linear,  is  {3w  =  l3M,  where  &M  is  the  true  regres 
sion  coefficient  for  the  means  and  j3w  is  the  true  pooled  within 
groups  regression  coefficient? 

To  mention  one  case  where  it  is  necessary  to  know  the  answers  to  the 
questions  stated  above,  we  cite  the  technique  known  as  analysis  of  co- 
variance  which  we  shall  study  in  detail  in  a  later  chapter.  This  tech 
nique  has  as  one  of  its  basic  assumptions  the  requirement  that  the  same 
regression  coefficient,  /3,  apply  to  all  groups.  Hence,  the  need  for  an 
appropriate  test  is  clear.  Let  us  now  outline  the  general  procedure  to  be 
followed.  Suppose  we  have  k  groups  and  n*  observations  (on  both  X 
and  Y)  in  the  zth  group.  We  may  present  most  of  the  necessary  calcu 
lations  in  Table  8.22,  where 


(8.111) 

"*  *<  V     J  — I  /      \     7-.1 

5*=  S  (Xf  —  J?i)(F^-~- 7*)=  y^jy^-Ft (8.112) 

J-l  y-1 
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/      nt  \ 

(2tF«) 


_       -.   . 

:  (F<,  -   F,)2  =  Z)  F*  --  —  -  >  (8-  113) 

—  i  y«-i 


and 


(&  Tli 

2:2:^ 
i*-l  J«=l 


jo»i  i^l    ^=0=1  ~_^ 

>  .    Wi 
t—  1 

f:  (-YO-  - 


k 

n.  ^ 


z;^)  2:2:  F« 

=  2:2:  ^-F,-  -    ±±±^  —  ^^i  —  ,         (8.115) 
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If  we  designate  by  >S2  the  sum  of  squares  among  the  k  group  regres 
sion  coefficients,  that  is,  if  $2  is  a  sum  of  squares  expressing  the  amount 
of  variation  among  61,  62,  •••,&*,  where  bj  is  the  regression  coefficient 
in  the  jfth  group,  it  may  be  shown  that 

2 


02  ==  ^w  — 

A 

(8.117) 


y;  J!i  _  J 


Similarly,  we  may  designate  the  sum  of  squares  of  the  deviations  of  the 
F-means  from  the  regression  of  y-means  on  X-means  by  ASf3,  where  this 
is  calculated  as  shown  in  Table  8.22.  The  square  of  the  difference  be 
tween  the  regression  coefficient  computed  from  the  "pooled  within" 
values  (6^-)  and  the  regression  coefficient  for  the  regression  of  the 
means  (6^)  is  given  by  (6^—  &jir)2-  If  we  multiply  this  by  a  suitable  fac 
tor,  it  becomes  another  estimate  of  cr^,  assuming,  of  course,  a  constant 
variance  of  Y  for  all  X.  This  may  be  expressed  as 

(8.118) 
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and  noting  that  in  Table  8.22  we  defined  Si  as  the  pooled  sum  of 
squares  of  deviations  from  regression,  we  can  show  that 

ST  ==  .Si  +  S*  +  ^3  +  6V  (8.  119) 

Furthermore,  the  degrees  of  freedom  associated  with  ST  may  be  sub 
divided  in  the  following  fashion : 

"   +  (k  -  1)  +  (k  -  2)  +  1,        (8.  120) 


JT)  n.  _  2  =  (  32  »<  - 

ml  \    i«l 


and  these  are  associated  with  S±,  S%,  /S3,  and  S4,  respectively. 

Now  we  are  in  a  position  to  answer  the  questions  posed  at  the  begin 
ning  of  this  section.  Let  us  consider  these  in  order  and  indicate  the 
proper  test  procedures. 

1.   Can  one  regression  line  be  used  for  all  the  observations? 

(8.121) 


/ '(£»  - 


2.  To  test  11:01—  •  •  •  =/3&:  This  test  and  the  succeeding  ones  are 
usually  cozisidered  if  F  in  Equation  (8.121)  turns  out  to  be  significant. 
That  is,  we  are  curious  as  to  the  reason  for  the  significance. 

S*/(k  —  1) 


3.    To     test     whether     regression     of     means     is     linear     (assuming 

S*/(k  —  2) 

(8.123) 


(Si 

'      \  *— i 

4.    To  test  H\$W  —  $M   (assuming  regression  of  means  is  linear  and 


It  should  bo  clear  that  the  order  in  which  those  tests  arc  performed  is 
very  important  since  the  assumptions  necessary  for  the  later  tests  are 
tested  as  hypotheses  in  the  earlier  tests.  Note  also  that  if  a  sequence  of 
tests  is  applied,  the  critical  level  (true  probability  of  Type  I  error)  of 
the  sequence  is  not  known  though  it  is  needed  for  proper  interpretation 
of  the  results. 
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Example  8.9 


Consider  Table  8.23.  To  test  H:/3i  =/32=/33  using  a:  =  0.01,  we  calculate 
F==  150/15.  667  =  9.  5  with  j>i=2  and  7^  =  300  degrees  of  freedom.  Since 
F  —  9.5  >  7^0.99(2,300),  the  stated  hypothesis  is  rejected.  The  performance 
of  the  remaining  tests  is  left  as  an  exercise  for  the  reader. 

TABLE   8.23-Hypothetical  Data  to  Illustrate  the  Procedure 
for  Testing  the  Hypothesis 


Group 

Degrees 
of  Free 
dom 

Z*2 

!!C  *y 

Z^2 

Z:y2-(I>}02/:£*2 

Degrees 
of  Free 
dom 

Mean 
Square 

A  

101 
101 
101 

400 
200 
400 

800 
600 
600 

4000 
3000 
2000 

2400 
1200 
1100 

100 
100 
100 

B  

C  

Total 

303 

1000 

2000 

9000 

4700 
5000 

300 
302 

15.667 

Difference  for  testing  Hi  /3i=/32=/33 

300 

2 

150 

Granting  the  assumption  that  two  populations  have  a  common  vari 
ance,  the  hypothesis  HifBi  —  pz  may  be  tested  against  the  alternative 
by  examining 

*=(fti-  6aOA^^  (8.125) 

(8.126) 


where 


(-ar«  - 


y—  i 


and 


y—  i 


n\ 


—  4 


(8.127) 


The  value  of  i  specified  by  Equation  (8.125)  is,  of  course,  distributed  as 
"Student's"  t  with  *>  =  ni+n2  —  4:  degrees  of  freedom.  Confidence  limits 
for  01*—  02  may  also  be  obtained  by  use  of  the  foregoing  equations.  It  is 
to  be  noted  that  economies  in  calculation  may  be  achieved  by  select 
ing,  whenever  possible,  the  same  -X"  values  for  both  samples. 

8.27     SOME  USES  OF    REGRESSION  ANALYSIS 

The  uses  to  which  regression  techniques  may  be  put  are  numerous.  A 
few  of  the  more  important  are : 
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(1)  To  reduce  the  length  of  a  confidence  interval  when  estimating 
some  population  mean  (or  total)  by  considering  the  effect  of 
concomitant  variables. 

(2)  To  eliminate  certain  "environmental"  effects  from  our  esti 
mates  of  treatment  effects;  that  is,  we  may  wish  to  examine 
adjusted  Y-  values. 

(3)  To  predict  Y  knowing  values  of  X^  •  •  •  ,  Xk  (our  auxiliary 
variables)  whether  or  not  a  causal  relationship  exists. 

(4)  To  influence  the  outcome  of  the  dependent  variables  assum 
ing,  of  course,  that  we  have  a  causal  relationship. 

There  are  many  other  uses  for  regression  methods  which  might  have 
been  listed.  We  have  not  attempted  to  exhaust  the  possibilities,  nor 
have  we  attempted  to  give  our  examples  in  any  order  of  importance. 
The  relative  importance  of  the  different  uses  will  vary  depending  on 
the  subject  matter  being  discussed. 

Problems 

8.1        Derive  the  normal  equations  specified  by  Equation  (8.14). 


8.2 
8.3 
8.4 


8.5 


8.6 


8.7 


Given  the  following  values,  find: 

£>:*  =  121  Z;,Y  «  20 


Derive  Equation  (8.21). 

Derive  Equation  (8.22)  from  Equation  (8.25). 

iX,  (b)  SB,  (c)  sbi. 

-  82 

n  —  10 

Find  the  linear  regression  of  F  on  X  given  the  values; 

X:    3  8  4  11  9 
F:    5~"3~4"""l~2 

Given  that 

n  «  277,     5*  -  65,     F  -  72,     ]£>'  «  1600,     £y  -  3600,     Z^  -  2000, 

compute:  (a)  SB,  (b)  sbii  (c)  sp  for  -Y==45. 

Given  the  abbreviated  analysis  of  variance  shown  below,  perform  the 

following: 

(a)   Test  /7:j5i«0  using  a  =  0.01. 

(6)    Compute  the  standard  error  of  estimate,  $$. 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 

Square 

Due  to  regression  

1 

40 

40 

Deviations  about  regression  

SO 

200 

4 

Total 

51 

240 

8.8       Given  that 

n  -  38,     3?  «  5,     "F  «  40, 

answer  the  following: 
(a)   Determine  ?*==»( 


-  100, 


»*  10,000, 


-  -  800 
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(6)    Test  #  :/3i  =  0  using  a  =  0.05. 

(c)  Partition  ^y2  into  two  parts,  one  associated  with  the  slope  of  the 
linear  regression  and   the   other   associated   with  the   deviations 
about  regression. 

(d)  For    the    observation     (X  =  S,     F  =  36),     compute    the    adjusted 
value  of  Y. 

(e)  Interpret  both  61  and  /3i. 

8.9  Given  that 

n  =  62,     X  =  10,      Y  =  20,      £>*  =  40,      ^y*  -=  250,      £>;y  =  —  80, 

solve  the  following: 

(a)    Determine  y  =  60  + &i-X^. 

(6)  Compute  a  99  per  cent  confidence  interval  for  /3i.  State  all  as 
sumptions. 

(c)  Estimate  the  gain  in  information  from  using  X  as  a  statistical 
control  (the  regression  of  Y  on  -XT)  in  estimating  the  population 
mean  of  Y.  (NOTE:  Information  is  here  used  as  a  synonym  for 
"the  reciprocal  of  the  variance/7) 

8.10  Given  that  61  =  0.2  grams  of  gain  per  gram  of  feed  eaten,  find  the  net 
difference  between  the  gains  of  two  rats  where  one  animal  consumed 
200  grams  of  feed  and  gained  60  grams  while  the  other  animal  con 
sumed  300  grams  and  gained  90  grams. 

8.11  The  data  in  the  table  given  below  represent  the  heights  (Jf)  and  the 
weights  (F)  of  several  men.  We  selected  the  heights  in  advance  and 
then  observed  the   weights   of  a  random   group  of  men  having  the 
selected  heights. 


X 


60  in.  HOlbs. 

61  135 

60  120 

61  126 

62  140 
60  130 
62  135 
65  158 
64  145 
70  170 
72  185 
70  180 


(a)  Plot  a  scatter  diagram. 

(6)  Obtain  the  estimated  regression  line  F"  = 

(c)  Calculate  and  interpret  a  90  per  cent  confidence  interval  for  /3i. 

(d)  Calculate  and  interpret  a  98  per  cent  confidence  interval  for  /50- 

(e)  Calculate   and   interpret   a   95   per   cent   confidence   interval   for 


*~  66* 

(/)  Test  the  hypothesis  £T:/3i  =  0. 
(g)  Test  the  hypothesis  H:0i  =  G. 
W  Test  the  hypothesis  H:(3<>=  —30. 
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(i)  Predict  the  weight  of  an  individual  who  is  66  inches  in  height. 
Give  a  "prediction  interval." 

(j)  Estimate  the  height  of  a  man  whose  recorded  weight  is  170  pounds. 
Give  both  point  and  interval  estimates. 

(&)    Test  for  "linearity  of  regression." 

(Z)  What  proportion  of  the  variation  in  Y  is  "explained"  by  the  re 
gression  of  weight  on  height? 

(NOTE:   Give  all  assumptions  and  use  a  probability  of  Type  I 
error  equal  to  .05  in  each  test.) 

8.12  Assuming  the  data  given  in  Problem  4.3  to  be  a  random  sample  from  a 
bivariate    normal   population,    (a)    calculate   the   regression   for   esti 
mating  weight  from  height,  (b)  calculate  the  regression  for  estimating 
height  from  weight,  (c)  plot  a  scatter  diagram  and  show  both  regres 
sion  lines  thereon. 

8.13  The  Consumer  Market  Data  Handbook,  1939  edition,  U.S.  Department 
of  Commerce,   lists   consumer  market  data  by  states,   counties,   and 
cities.  Among  the  types  of  information  listed  are   "Population  and 
Dwellings/'    "Volume   and  Type   of   Business   and   Industry,    1935," 
"Employment  and  Payrolls,  1935/'  "Retail  Distribution  by  Kinds  of 
Business,   1935/'  and   "Related  Indicators  of  Consumer  Purchasing 
Power. "  Among  the  latter  are  numbers  of  income  tax  returns,  auto 
mobile  registrations,  radios,  telephones,  electric  meters,  and  magazine 
subscribers. 

Such  information  as  listed  above  might  be  used  by  national  ad 
vertising  agencies,  large  sales  organizations,  and  by  individual  retail 
or  manufacturing  agencies  for  various  purposes  in  planning  their 
business  activities.  The  numbers  and  kinds  of  analyses  which  might 
bo  considered  for  such  data  arc  largo.  We  have  selected  only  a  small 
portion  for  study  in  this  problem. 

The  data  given  here  present  the  filling  station  sales  per  capita  (F) 
and  the  automobile  registrations  per  1000  persons  (-XT)  for  a  group  of 
Iowa  counties. 
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IOWA  CONSUMER  DATA 


County 

Per  Capita 
Sales  of  Fill 
ing  Stations 
(yearly'} 

Automobile 
Registrations 
per  1OOO 

Persons 

Adair  

$17 

206 

Adams  ... 

25 

233 

Allamakee  . 

16 

237 

Appanoose  

13 

183 

Audubon  

28 

243 

Benton  

27 

230 

Black  Hawk 

20 

272 

Boone  

21 

214 

Bremer  

22 

314 

Buchanan  .               ... 

16 

263 

Buena  Vista  .                  .  . 

32 

314 

Butler  

27 

295 

Calhoun  

27 

273 

Carroll  .  .  . 

21 

279 

Cass    . 

30 

283 

Cedar  

21 

276 

Cerro  Gordo  

23 

265 

Cherokee  

43 

254 

Chickasaw 

23 

264 

Clarke  

32 

194 

Clay  

23 

285 

Clayton  

14 

255 

Clinton  ...            . 

21 

232 

Crawford 

19 

238 

Dallas  

24 

271 

Davis  

18 

224 

Decatur  

12 

203 

Delaware 

22 

23O 

(a)  Plot  these  data  on  an  8X11  sheet  of  graph  paper.  On  the  abscissa 
or  -X"-axis  place  automobile  registrations  anc^or^the  ordinate  or  Y- 
axis  plot  sales  per  capita.  Plot  the  point  (X,  "F)  from  the  results 
to  be  obtained  below.  

(6)  Calculate  the  means,  'X  and  T,  and  the  standard  deviations  for 
X  and  Y. 

(c)  Fit  a  straight  line  to  the  plotted  points  by  obtaining  the  regression 
of  Y  on  X  as  a  least  squares  fit.  What  is  the  model  in  this  case — 
i.e.,   for   a  single   observation,    county  per   capita  sales  by  filling 
stations?  What  parameters  do  the  statistics  feo  and  61  estimate? 
Explain  in  words  the  meaning  of  60  and  61,,  that  is,  interpret  the 
results  of  your  analysis. 

(d)  Plot  f*  =  bQ  +  biX  on  the  scattergram. 

(e)  Calculate  &  and  Y  —  &  f  or  each  X, 

(/)     Calculate  (F— ]P")2  for  each  X  and  thus  obtain  S(F— F)2-  Com- 
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pare  this  value  with  X)?/2""^i5D^2/-  Then  obtain  Sg  and  explain  its 

relationship  to  the  values  of  Y  —  ?. 
(0)    Determine  95  per  cent  confidence  limits  for  fti. 

8.14  On  the  basis  of  the  following  tabulations  comparing  years  of  service 
with  ratings,  the  management  seeks  to  discover  whether  or  not 
there  is  a  distinct  tendency  to  rate  old  employees  higher  than 
more  recent  additions  to  the  working  force. 


Employee* 

Service 
(in  years) 

Rating 

Employee 

Service 
(in  years) 

Rating 

A 

1 

5 

K 

6 

9 

B  

9 

6 

L           .     . 

7 

4 

C  

8 

8 

M  

1 

2 

D    .        ... 

3 

8 

N 

1 

3 

E  

3 

6 

O  

3 

8 

F    .     .    . 

2 

7 

P.  .    . 

1 

6 

G  

4 

5 

O  

2 

5 

H 

5 

6 

R  . 

2 

3 

I  

5 

4 

S.  .    , 

4 

4 

J  

6 

5 

T  

2 

7 

*  Source:  G.  R.  Da  vies,  Business  Statistics,  p.  338. 

(a)  Plot  a  scatter  diagram  ( X  =  service,  F  =  rating). 

(6)  Obtain  the  regression  line  J^  —  fto  +  friAT. 

(c)  Compute  s/^ 

(d)  Compute  s^. 

(tf)    Set  your  results  out  in  an  analysis  of  variance  table. 

(/)    Test  //:/3i  =  0  using  (1)  a  S-tcst  and  (2)  an  latest. 

(0)  Estimate  the  average  rating  which  might  be  given  an  employee 
with  (1)  4  years'  service,  (2)  15  years'  service.  Give  both  point 
and  interval  estimates.  Discuss  the  validity  of  these  estimates. 

(/&)  Kstimate,  hy  interval,  what  rating  an  individual  employee  with 
4  years'  service  might  be  expected  to  receive. 

(NOTE:  Whenever  necessary,  state  all  the  assumptions  made  in  order 
to  use  the  techniques  involved.) 

8,15  Assume  you  are  an  investment  counselor  for  a  large  insxirancc  com 
pany.  As  one  of  yoxir  duties,  you  woxild  need  to  have  some  idea  of  the 
amount  of  policy  loans  per  year,  i.e.,  loans  to  policyholders,  using 
their  life  insurance  policies  as  collateral.  Suppose  you  wish  to  estimate 
the  total  amount  of  policy  loans  your  company  will  make  during  the 
coming  year.  Assume  the  date  to  be  January  I,  1948.  You  are  given 
the  data  sheet  shown  below.  (I)  What  methods  of  estimation  might 
you  use  and  what  would  your  estimates  be?  (2)  What  further  informa 
tion  might  you  request  hi  order  to  do  a  better  job?  (3)  Give  reasons  for 
the  answers  you  make  to  (2). 
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Year 

National  In 
come*  (in  mil 
lions  of  dollars) 

Estimated 
Population 
of  ILS.A.t 
(in  thousands) 

Policy  Loans 
Made  by  U.S. 
Life  Insurance 
CompaniesJ 
(in  millions 
of  dollars) 

1929.  .  . 

87,355 

121,770 

2,379 

1930 

75   003 

123,077 

2,807 

1.  ... 

58,873 

124,040 

3,369 

2  

41,690 

124,840 

3,806 

3. 

39,584 

125,579 

3,769 

4  

48,613 

126,374 

3,658 

5.  .  . 

56,789 

127,250 

3,540 

6  

64,719 

128,053 

3,411 

7 

73,627 

128,825 

3,399 

8  

67,375 

129,825 

3,389 

9.    . 

72,532 

130,880 

3,248 

1940  

81,347 

131,970 

3,091 

1 

103,834 

133,203 

2,919 

2.  . 

136,486 

134,665 

2,683 

3    . 

168,262 

136,497 

2,373 

4.  .  . 

182,407 

138,083 

2,134 

5  

181,731 

139,586 

1,962 

6.  .  . 

179,289 

141,235 

1,891 

7  

202,500 

144,034 

1,937 

8 

224,  400  § 

146,571 

*  Statistical  Abstract  of  the  United  States,  1949,  p.  281. 

t  Op.  cit.,  p.  7. 

j  Life  Insurance  Fact  Book,  1949,  p.  67. 

§  Estimated. 

8.16  Let  us  assume  that  one  of  your  duties  is  that  of  preparing  reports  for 
the  managing  director  of  the  firm.  They  are  engaged  in  manufacturing 
and  are,  of  course,  interested  in  the  average  cost  per  unit  of  production. 
Obviously,  units  of  production  are  easily  measured,  but  average  cost 
requires  lengthy  and  difficult  computations.  If  some  relationship 
between  these  two  quantities  can  be  determined  empirically,  an 
estimation  procedure  may  be  employed.  From  past  records,  the 
following  data  are  available: 
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F  =  Average  Cost  X  =  Units  Produced 

(in  cents')  (in  thousands*) 


1.1  

9 

1.9  

13 

3.5  

5 

5.9  

17 

7.4  

18 

1.4  

8 

2.6  

14 

1.4  

12 

1.9  

7 

3.5  

15 

1.0  

10 

1.1  

11 

4.6  

4 

17.9  

23 

(a)    What  methods  would  you  employ  to  have  available  a  means  of 

estimating  values  of  Y  if  X  were  known? 
(6)    Describe  briefly  what  devices  you  would  use  to  determine  the  type 

of  curve  that  would  best  fit  a  given  set  of  data. 

8.17  An  advertising  concern  is  interested  in  prorating  sales  by  counties 
for  Maryland.  In  hopes  of  using  magazine  circulation  per  1000  popu 
lation  to  aid  them,  they  obtained  the  following  data: 
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Magazine  Circulation  per 
1000  Population 


Per  Capita 
Sales  (F) 


159 279 

114 184 

67 137 

79 126 

112 213 

124 184 

129 181 

58 133 

85 161 

127 228 

64 129 

131 182 

75 142 

116 199 

141 268 

133 189 

76 161 

48 105 

68 102 

127 235 

150 259 

136 232 

114 216 


=  4245 
=  841 , 133  ]T .X" F  =  482  , 786 


(a)    Determine  the  regression  equation. 

(6)    If  the  circulation  in  County  A  were  90,  what  would  you  estimate 
the  per  capita  sales  to  be?  What  is  the  standard  error  of  f"? 

(c)  Is  the  regression  coefficient  significant? 

(d)  What  are  your  assumptions?  Are  they  justified? 

8.18  Given  the  following  data  satisfying  the  normality  and  homogeneous 
variance  assumptions,  do  you  believe  that  the  true  regression  is 
actually  linear? 


X 


4 
3 
6 

7 


18 
19 
18 
13 


26 

25 
24 
21 


38 
35 
28 
31 


44 
43 
39 
38 
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8.19  For  the  following  data,  test  the  hypothesis  that  /3i=/32  =  ^3  =  /34, 
where  we  assume  normality,  etc.,  as  required  for  simple  linear  re 
gression. 


Samples 

Degrees  of 
Freedom 

Z*2 

S  xy 

Z:v2 

A  

67 

300 

312 

550 

B 

75 

500 

515 

758 

C  . 

115 

200 

216 

375 

D  

34 

200 

300 

500 

8.20  Using  the  data  given  below,  fit  a  second  degree  polynomial  (parabola) 
for  gross  profits  per  farm  against  months  of  labor  to  obtain  a  curve 
for  Iowa  farms. 


Farm  No.* 

Gross 
Profits 
Y 

Months 
of  Labor 
X 

Farm  No.* 

Gross 
Profits 
Y 

Months 
of  Labor 
X 

1.  . 

16.7 

20 

18      

11.2 

14 

2.  .  . 

17.9 

19 

19  

9.4 

15 

3  

17.4 

24 

20  

8.7 

12 

4  

14.9 

15 

21  

12.2 

17 

5 

16.2 

24 

22 

7.7 

14 

6.  .  .     . 

14.0 

15 

23  

11.5 

14 

7  

15.1 

24 

24  

7.3 

13 

8 

18.3 

24 

25 

11.8 

16 

9    . 

11.3 

16 

26.  . 

15.1 

23 

10.  .  . 

18,3 

26 

27  

10.5 

33 

11  

17.1 

24 

28  

17.0 

29 

12 

12.0 

16 

29 

15.6 

30 

13 

15.2 

25 

30 

13.2 

31 

14,    .  .     . 

16.2 

28 

31  

17.2 

22 

15  

14.9 

24 

32  

14.6 

32 

16  

10.5 

15 

33  

12.2 

34 

17  

16.5 

27 

34  

9.8 

36 

*  Source:  Selected  values  from  Iowa  farm  records  plus  some  supplementary  hypothetical 
observations. 


8.21 


with 


=  244, 


8.22 


Given     the     linear     regression: 

Z^^"2  =  58,000,  and  n  =  100: 

(a)   What  is  the  standard  error  of  ^  =  254? 

(&)    For  what  ]?"  value  is  its  variance  a  minimum? 

(c)  Given  that  information  is  the  reciprocal  of  the  variance,  how  may 
we  maximize  our  information  about  the  unknown  parameter  /?i 
in  estimating  a  linear  regression  similar  to  the  above? 

In   a   regression   study  the   following   preliminary    calculations   were 

made:  3?  =  20;  T  =  22;  23  (X—  JT)2 


(a)   What  is  the  estimate  for  the  population  regression  coefficient? 
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(6)    How  do  you  interpret  the  population  regression  coefficient  /3i? 

(c)  Obtain  the  regression  equation  in  the  form  Y  =  b^  +  b^X. 

(d)  Test  the  hypothesis  H:@i  =  l. 

8.23  An  economist  from  the  University  of  Hawaii  and  an  economist  from 
the  University  of  Chicago  were  comparing  their  studies  of  income  and 
the  consumption  of  various  goods.  Among  the  items  studied  was 
gasoline  for  use  in  private  automobiles.  Each  had  used  a  sample  of 
about  100  university  employees  chosen  to  cover  the  range  of  salaries 
and  wages.  The  Chicago  economist  reported  that  he  had  observed  an 
increase  in  gas  consumption  of  10  gallons  per  $100  increase  in  income 
while  the  Hawaii  economist  noted  an  increase  of  only  4  gallons  per 
$100  increase  in  income.  They  then  looked  at  the  variances  of  their 
regression  coefficients  and  gave  these  figures,  V(bc)  =  2.41  and 


(a)  Could  the  observed  difference  between  the  regression  coefficients 
be  expected  to  occur  more  than  once  in  20  times  by  chance  if  we 
consider  the  necessary  assumptions  for  such  a  test  to  be  fulfilled? 

(6)  Would  your  conclusion  be  changed  if  the  change  in  gas  consump 
tion  had  been  reported  as  0.1  gallon  per  $1  increase  for  Chicago 
and  .04  gallon  per  $1  increase  for  Hawaii?  Or  what  would  the 
variance  of  bH  be,  if  the  regression  coefficients  had  been  reported 
in  the  latter  unit,  per  dollar  increase  in  income? 

(c)  When  the  Hawaii  economist  tested  the  hypothesis  (/S^  —  0),  he 
obtained  a  tf-  value  of  2.828=  (4  —  0)/-\/2.  Approximately  what  F- 
value  would  he  have  obtained  if  he  had  examined  the  reduction  in 
sum  of  squares  due  to  linear  regression  by  preparing  an  analysis 
of  variance? 

[NOTE:    V(§)   is  another  way  of  expressing  s|,  for  example,    t^(6c) 

=<j 

8.24      The  performance  of  a  tensile  strength  test  on  a  specific  metal  yielded 
the  following  results: 

Brinell  Hardness  Tensile  Strength 

Number  (1OOO  psi) 

X  Y 

104  38.9 

106  40.4 

106  39.9 

106  40.8 

102  33.7 

104  39.5 

102  33.0 

104  37.0 
102  33.2 

102  33.9 
101  29.9 

105  39.5 

106  40.6 

103  35.1 
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(a)  Determine  the  best  linear  regression  equation  by  least  squares 
and  obtain  confidence  limits  for  estimating  the  mean  tensile 
strength  associated  with  a  specified  Brinell  number. 

(6)  Is  any  functional  form  other  than  a  linear  equation  indicated  by 
these  data?  Make  the  appropriate  test  and  discuss  your  results. 

8.25  Using  the  data  of  Table  8.10,   obtain  the  following  regression  equa 
tions  : 

(a)  Y  =  bQ  4-  biXi 
(Z>)  y  «  &0  4.  b2x* 
<V)  Y  =  b0  4-  £3^3 

(d)       Y    =    60   +    &4^4- 

For  each  regression  equation,  perform  a  complete  analysis.  Comment 
on  the  four  different  values  of  60.  Also,  compare  the  results  of  this 
problem  with  the  multiple  regression  analysis  obtained  in  Example  8.5. 

8.26  The  solubility  of  nitrous  oxide  in  nitrogen  dioxide  was  investigated 
with  the  following  results: 


Reciprocal  Temperature 

(=  10OO/  degrees  absolute) 

3.801         3.731 

3.662        3.593        3.533 

Solubility 

1.28           1.21 

1.11           0.81           0.65 

(per  cent  by  weight) 

1.33           1.27 

1.04          0.82          0.59 

1.52 

0.63 

Perform  a  complete  regression  analysis  and  interpret  your  results. 

8.27  A  Rockwell  hardness  test  is  fairly  simple  to  perform.  However,  the 
determination  of  abrasion  loss  is  difficult.   In  an  attempt  to  find  a 
way  of  predicting  abrasion  loss  from  a  measurement  of  hardness,  an 
experiment  was  run  and  data  collected  on  30  samples.  The  following 
results  were  obtained: 

JT  -  70.27,          7  =  175.4,          I>2  -  4300,          **T,y*  =  225,011, 
£>;y  =  —  22,946,          s%  —  3663,     and      f  =-  550.4  —  5.336X. 

Estimate  the  abrasion  loss  when  hardness  is  70.  Discuss  the  usefulness 
of  the  prediction  equation. 

8.28  A  gauge  is  to  be  calibrated  using  dead  weights.  If  X  represents  the 
standard  and  Y  the  gauge  reading,  perform  a  linear  regression  analysis 
based  on  the  following  results  from  10  observations: 

2*  =  230,          7  =  226,          2Z«y  =  1532,          £>2  «  1561,          Z^y2  =  1539. 
Test  H:  #L=1  using  «  =  0.01 
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8.29     Elongation  of  steel  plate  (F)  is  related  to  the  applied  force  in  psi 
Given  the  data 


X 


1.33 

26 

2.68 

51 

3.50 

66 

4.40 

84 

5.35 

101 

6.27 

117 

7.11 

133 

8.93 

150 

9.76 

182 

10.81 

202 

perform  a  complete  regression  analysis  and  interpret  your  results. 

8.30  It  is  desired  to  determine  the  relationship  of  a  twisting  movement  to 
the  amount  of  strain  imposed  on  a  piece  of  test  metal.  Eight  samples 
were  obtained  and  the  following  data  observed: 

Twisting  Movement  (X)  Strain  (F) 

100  112 

300  330 

500  546 

700  770 

900  1010 

1000  1100 

1200  1323 

1300  1515 

Determine  the  "best"  relationship  between  X  and  F.  Interpret  your 
results. 

8.31  The  data  given  below  and  identified  as  F,  Xi,  and  X*  represent  annual 
figures  for  1919  to  1943,  a  25-year  period,  for  three  adjacent  counties 
in  the  semiarid  central  area  of  South  Dakota.  F*  is  the  average  yield 
of  oats  in  the  ith  year.  XM  is  preseason  precipitation  in  inches,  e.g., 
9.82  for  JSTii  is  the  rainfall  from  August,  1918,  to  March  31,  1919,  etc. 
X%{  is  the  growing  season  precipitation  in  inches.  This  rainfall  covers 
the  period  April  1  to  July  31  for  each  crop  year  listed.  Due  to  the 
nature  of  weather  and  yield  data,  we  may  assume  that  these  data 
fulfil  our  necessary  assumptions  for  multiple  linear  regression.   Do 
a  complete  analysis  and  interpretation,  of  the  data.  The  reader  should 
note,  though,  that  these  are  time  series  data,  and  thus  an  ordinary 
multiple  linear  regression  analysis  may  be  of  doubtful  validity. 
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Year 

F 

^Ti 

X2 

1919  

30.8 

9.82 

14.85 

1920  

34.2 

9.12 

17.30 

1  .  . 

14.3 

6.24 

9.92 

2  

34.5 

14.06 

9.33 

3  

32.7 

5.29 

12,01 

4.  .  .          . 

36.0 

7.74 

10.87 

5  . 

33.8 

9.40 

11.78 

6  

3.7 

4.22 

7.14 

7  

26.1 

8.11 

14,44 

8  

18.6 

6.30 

8.95 

9.  .  . 

15.0 

10.58 

6.15 

1930  

23.8 

8.62 

8.63 

1  

4.4 

10.53 

6.19 

2  

23.5 

7.05 

8.86 

3  

0.1 

7.75 

7.97 

4  

0.0 

4.41 

4.93 

5  

19.7 

7.05 

11.27 

6  

0.0 

6.90 

5.37 

7  

4.5 

7.97 

8.78 

8  

14.4 

5.41 

10.37 

9  

13.4 

7.30 

8.78 

1940  

11.8 

5.94 

7.06 

1  

22.2 

6.77 

10.44 

2  

42.9 

11.23 

14.58 

3  

24.6 

8.55 

9,57 

8.32 


A  study  of  18  regions  gives  the  following  data  on  suicide  rate,  age, 
per  cent  male,  and  business  failures.  Fit  an  equation  for  the  linear 
regression  of  Y  on  JSTi,  X%,  and  X3,  where 

F    =  suicide  rate 
Xi  «  age 

X%  *»  per  cent  male 
X$  =  business  failures 

and  analyze  completely.  The  summary  of  the  data  follows: 

53  F      «         285.3  ^YXi    *     8536.6165 

—  531.09  53^-Ya    —  14500.1161 
=»         911.95  23FAr3    -  29644.847 

—  1800  53-^1^2  —  26913,822 
-<  4905.6904  2^Yi-Y3  »  53614.575 
=^  15731,2223  23^Xa  =  91U31 .630 
«  46218.4473 

»  199843.52 


8.33     Do  a  complete  multiple  linear  regression  analywin  of  the  following  data. 
Interpret  yoxir  results. 
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T>  o  KHif- 

Choles 
terol 
Dosage 
(gm.  per 
day) 

Average 
Blood 
Total 
Choles 
terol 
(m#.) 

Initial 
Weight 

(&00 

Ratio  of 
Final 
Weight  to 
Initial 
Weight 

Average 
Food  In 
take  per 
kg.  Initial 
Weight 
(gm.  per 
day) 

Degree 
of  Athero 
sclerosis 

No. 

Xi 

X* 

X* 

X4 

X5 

Y 

1  

30 

424 

2.46 

0.90 

18 

2 

2  

30 

313 

2.39 

0.91 

10 

0 

3.  ... 

35 

243 

2.  75 

0.95 

30 

2 

4  

35 

365 

2.19 

0.95 

21 

2 

5  

43 

396 

2.67 

1.00 

39 

3 

6  

43 

356 

2.74 

0.79 

19 

2 

7  

44 

346 

2.55 

1,26 

56 

3 

8 

44 

156 

2   58 

0,95 

28 

0 

9.  .    . 

44 

278 

2.49 

1,  10 

42 

4 

10.  .    . 

44 

349 

2,52 

0.88 

21 

1 

11.  .    . 

44 

141 

2.36 

1.29 

56 

1 

12  

44 

245 

2.36 

0.97 

24 

1 

13  

45 

297 

2.56 

1.11 

45 

3 

14  

45 

310 

2.62 

0.94 

20 

2 

15  

45 

151 

3.39 

0.96 

35 

3 

16  

45 

370 

3.57 

0.88 

15 

4 

17 

45 

379 

1.98 

1.47 

64 

4 

18    . 

45 

463 

2.06 

1.05 

31 

3 

19    . 

45 

316 

2.45 

1.32 

60 

4 

20    . 

45 

280 

2.25 

1.08 

36 

4 

21  .  . 

44 

395 

2.15 

1.01 

27 

1 

22.  .. 

49 

139 

2.20 

1.36 

59 

0 

23  

49 

245 

2.05 

1.13 

37 

4 

24  

49 

373 

2.15 

0.88 

25 

1 

25  

51 

224 

2,15 

1.18 

54 

3 

26      . 

51 

677 

2.10 

1.16 

33 

4 

27.. 

51 

424 

2.10 

1.40 

59 

4 

28 

51 

150 

2.10 

1  .05 

30 

0 

8.34  You  arc  presented  with  farm  records  for  one  year  for  a  sample  of  89 
dairy  farms  located  in  a  fairly  homogeneous  area  in  the  same  milk  shed. 
The  records  contain  the  following  information: 

Y    —  milk  sold  per  cow  (Ibs.) 

Xi  =  amount  of  concentrates  fed  per  cow 

X%  =  silage  fed  per  cow 

X 3  =  pasture  cost  per  cow 

Xi,  —  amount  of  other  roughage  fed. 

You  first  decide  to  fit  a  multiple  linear  regression  of  F,  milk  sold,  on  the 
four  independent  variatcs,  the  X's  given  above.  Thus,  the  regression 
equation  is  of  the  form 

Y  «=  &0  +  biXi  +  bzX*  + 


220 


CHAPTER    8,    REGRESSION    ANALYSIS 


8.35 


(a)  List  the  numerical  quantities  and  statistics  you  would  compute  to 
obtain  this  regression  equation  for  Y.  You  need  not  give  detailed 
formulas.  In  particular,  you  will  wish  to  compare  b2  and  b4,  or 
silage  with  other  roughage  fed  in  effect  on  milk  production.  Also,, 
pasture  is  quite  homogeneous  in  the  area,  so  you  suspect  /33  may 
not  be  different  from  zero.  Include  in  your  list  such  items  as 
needed  for  examination  of  the  indicated  regression  coefficients. 
(6)  Supposing  you  obtain  61  =+0.30,  what  interpretation  would  you 

make  of  this  statistic? 
(c)    Can  you  suggest  any  other  form  for  this  regression  function,  using 

only  the  given  J^T's?  If  so,  write  it  out. 

Using  the  data  given  below,  obtain  a  multiple  linear  regression  equa 
tion.  (Do  a  complete  analysis.)  Then,  consider  other  possible  analyses 
and  comment  on  the  "best"  functional  relationship. 

DATA  FROM  25  IOWA  COUNTIES* 


Corn 

Yield 

Percent 

No. 

No. 

Percent 

Value 

Ob 
serva 
tion 

per 
Acre 
1910- 
1919 

age  Farm 
Land  in 
Small 
Grain 

Improved 
Acres 
per 
Farm 

Brood 
Sows  per 
1,000 
Acres 

age  Far  in 
Land 
in 
Corn 

per  Acre 
of  Land 
Jan.  1, 
1920 

Sum 

T>  um 
ber 

County 

Ar! 

X4 

A"a 

-Y4 

AT5 

F 

W 

I 

Allamakee 

40 

11 

103 

42 

14 

$  87 

297 

2 

Bremer 

36 

13 

102 

58 

30 

133 

372 

3 

Butler 

34 

19 

137 

53 

30 

174 

447 

4 

Calhoun 

41 

33 

160 

49 

39 

285 

607 

5 

Carroll 

39 

25 

157 

74 

33 

263 

591 

6 

Cherokee 

42 

23 

166 

85 

34 

274 

624 

7 

Dallas 

40 

22 

130 

52 

37 

235 

516 

8 

Davis 

31 

9 

119 

20 

20 

104 

303 

9 
10 

Fayette 
Fremont 

36 
34 

13 
17 

106 
137 

53 
59 

27 
40 

141 
208 

376 
495 

11 

Howard 

30 

18 

136 

40 

19 

115 

358 

12 

Ida 

40 

23 

185 

95 

31 

271 

645 

13 
14 
15 

Jefferson 
Johnson 
Kossuth 

37 
41 
38 

14 
13 
24 

98 
122 
173 

41 
80 
52 

25 
28 
31 

163 
193 
203 

378 
477 
521 

16 
17 

Lyon 
Madison 

38 
34 

31 
16 

182 
124 

71 
43 

35 
26 

279 
179 

636 
422 

18 

Marshall 

45 

19 

138 

60 

34 

244 

540 

19 

Monona 

34 

20 

148 

52 

30 

165 

449 

20 

Pocahontas 

40 

30 

164 

49 

38 

257 

578 

21 

Polk 

41 

22 

96 

39 

35 

252 

485 

22 
23 
24 

Story 
Wapello 
Warren 

42 
35 
33 

21 
16 
18 

132 
96 
118 

54 
41 
38 

41 
23 
24 

280 
167 
168 

570 
378 
399 

25 

Winnesluek 

36 

18 

113 

61 

21 

115 

364 

Sums 

937 

488 

3342 

1361 

745 

4955 

11828 

!Means 

37.48 

19.52 

133.68 

54.44 

29.80 

198  .  20 

473.12 

*  Reproduced  from;  H.  A.  Wallace  and  G.  W.  Snedecor,  Correlation  and  Machine  Calcu 
lation  (revised  ad,;  Ames,  Iowa;  The  Iowa  State  College  Press,  1931).  By  permission  of  the 
authors  and  publisher. 
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C  H  APT  E  R    9 

CORRELATION  ANALYSIS 

IN  CHAPTER  8,  methods  of  estimating  functional  relationships  among 
variables  were  presented.  Such  methods  have  many  uses  in  experi 
mental  work.  However,  there  is  a  related  matter  which  also  deserves 
attention  when  discussing  the  joint  variation  of  two  or  more  variables* 
It  is:  How  closely  are  the  variables  associated?  Or,  in  other  words, 
what  is  the  degree  (or  intensity)  of  association  among  the  variables? 

9.1  MEASURES  OF  ASSOCIATION 

The  techniques  that  have  been  developed  to  provide  measures  of  the 
degree  of  association  between  variables  are  known  as  correlation  meth 
ods.  This  name  reflects  the  universal  practice  of  speaking  about  "meas 
ures  of  correlation' *  rather  than  about  "measures  of  the  degree  (or  in 
tensity)  of  association."  Consequently,  when  an  analysis  is  performed 
to  determine  the  amount  of  correlation,  it  is  referred  to  as  a  correla 
tion  analysis.  The  resulting  measure  of  correlation  is  usually  called  a 
correlation  coefficient. 

In,  this  chapter  some  of  the  more  frequently  used  measures  of  corre 
lation  will  be  presented.  However,  because  of  the  close  ties  between  this 
chapter  and  some  of  the  preceding  chapters  (particularly  Chapter  8), 
it  will  be  sufficient  to  give  only  a  minimum  of  discussion. 

9.2  AN    INTUITIVE  APPROACH   TO   CORRELATION 

Because  of  the  nature  of  the  concept  of  correlation,  it  is  clear  that 
(in  most  cases)  it  is  closely  related  to  the  concept  of  regression.  In  fact, 
for  a  given  regression  equation,  it  seems  reasonable  to  expect  that  a 
correlation  coefficient  will  measure  how  well  the  regression  equation 
fits  the  data  or,  stating  this  in  reverse  fashion,  how  closely  the  sample 
points  baig  the  regression  curve.  Thus,  a  correlation  coefficient  will 
undoubtedly  be  related  to  the  standard  error  of  estimate  ($#)  which 
measures  the  dispersion  of  the  points  about  the  regression  curve. 

Pursuing  this  idea  and  denoting  the  correlation  coefficient  by  the 
symbol  R,  we  express  R  as  a  function  of  s^}  for  example, 

*-/(*,).  (9.D 

If  R  is  to  perform  satisfactorily  as  a  measure  of  correlation,  it  is  desir 
able  that  it  exhibit  two  characteristics: 

(1)  It  should  be  large  when  the  degree  of  association  is  high  and 
small  when  the  degree  of  association  is  low. 

(2)  It  should  be  independent  of  the  units  in  which  the  variables 
are  measured. 
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One   way  to  achieve  the  desired  properties  is  to    (approximately) 
define  R  by 

JV^l  -4/4  (9.2) 


where 

^E 


2     ^ST^    /  -rr  T>\  9  //„  ~\  /Q     '2\ 


4  =  13  (F  ~   F)V(»  -  1),  (9-4) 

and  g  is  the  number  of  parameters  in  the  true  regression  function  that 
were  estimated  by  the  regression  equation  symbolized  by  F.  If  n  is 
large  relative  to  q,  another  approximation  is 

R*G*  1  -  ]C  (F  -   f)V  ]C  (F  -   F)2.  (9.5) 


Since  53(F-  I^)2<  22  (F-  F)2,  it  is  clear  that  0  <B2<1.  Further,  if 
the  sample  points  hug  the  regression  curve  closely  (i.e.,  the  correlation 
is  high),  R2  will  be  close  to  1.  Similarly,  if  the  regression  curve  is  a  poor 
fit,  the  sample  points  "will  be  widely  dispersed  about  the  estimated 
regression  and  R2  will  be  close  to  0,  reflecting  a  low  correlation, 

Having  given  the  foregoing  intuitive  approach  to  correlation,  it  is 
necessary  that  a  more  precise  approach  be  formulated.  This  will  now 
be  done.  It  is  hoped  that  the  remarks  given  earlier  in  this  section  will 
aid  the  reader  in  appreciating  the  discussions  to  follow. 

9.3      THE   CORRELATION    INDEX 

Rewriting  Equation  (9.5)  as 


(F  -   F)2 
and  referring  to  Sections  8.8,  8.15,  and  8.16,  it  is  seen  that 

sum  of  squares  due  to  regression 
R*  =  -  -  -  ~  -- 
corrected  sum  of  squares 


(9.7) 


Since  the  ratio  defined  by  Equation  (9.7)  may  be  calculated  for  any 
estimated  regression  equation,  it  is  a  most  general  and  useful  measure 
of  correlation.  It  is  referred  to  as  the  correlation  index.  In  succeeding 
sections,  special  cases  will  be  examined  in  detail. 

9.4      CORRELATION    IN    SIMPLE    LINEAR    REGRESSION 

In  Section  8.8,  the  partitioning  of  the  sum  of  squares  of  the  depend- 
ent  variable  was  discussed  and  the  results  presented  in  Table  8.2. 
Referring  to  Table  8.2  and  invoking  Equation  (9.6),  we  obtain 

,  _   {  i:  Fg  -  CE 
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(9'8) 


where  r2  is  used  instead  of  R2  to  conform  with  standard  practice.  It  is 
customary  to  talk  about  r  rather  than  r2.  Thus,  we  have 


=  (9,9) 


which  assumes  the  same  sign  as   ^2xy,  and  hence  the  same  sign  as  bi. 
It   is  readily  seen  that  the  correlation   coefficient  associated  with 
simple  linear  regression  is  easily  obtained  once  a  regression  analysis  has 
been  performed.  Further,  it  is  clear  that 

—  l<r<l  (9.10) 

where  —  1  represents  perfect  negative  linear  association  in  the  sample 
and  +1  represents  perfect  positive  linear  association  in  the  sample.  A 
value  of  0  is  interpreted  to  mean  that  no  linear  association  between  X 
and  Y  exists  in  the  sample.  Since  r  is  only  a  sample  value,  any  infer 
ence  to  the  sampled  population  must  be  carefully  stated.  More  will  be 
said  concerning  this  a  little  later. 

Example  9.1 

Referring  to  Tables  8.3  and  8.4,  the  coefficient  of  linear  correlation 
between  X  and  Y  for  the  Schopper-Riegler  data  is  determined  as 
follows: 

r*  **  10,177.59/(51,712.00  —  41,217.23)  •»  0.9698 
r    -  V0.9698  =  0.98. 
Example  9.2 

For  the  following  data, 


X 


-2  4 

•1  1 

0  0 

1  1 

2  4 


it  may  be  verified  that  r  =  0,  indicating  no  linear  association.  Please 
note  carefully  the  word  "linear/7  for  a  moment's  reflection  will  reveal 
that  X  and  Y  are  perfectly  associated,  the  relationship  being  Y**X*. 
What  we  calculated  was  a  measure  of  linear  correlation  when  the 
indicated  relationship  is  actually  quadratic.  This  simple  example  should 
call  to  your  attention  one  of  the  greatest  potential  trouble  spots  in 
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correlation  analysis,   namely,   the  use  of  an  inappropriate  measure  of 
correlation. 

The  preceding  discussion  and  interpretation  of  r,  or  perhaps  we 
should  say  of  r2,  is  most  valuable  in  regression  analyses.  Examination 
of  Equations  (9.7)  and  (9.8)  reminds  us  that  lOOr2  is  the  percentage  of 
the  corrected  sum  of  squares  that  is  "explained  by7'  the  fitting  of  the 
simple  linear  regression  Y=bQ+biX.  If  this  percentage  is  not  large 
enough  to  satisfy  us,  a  better  fitting  regression  equation  should  be 

found.  . 

Some  terms  associated  with  the  coefficient  of  correlation  that  are 

sometimes  encountered  are: 

r2  =  coefficient  of  determination,  (9.11) 

1  —  r2  =  coefficient  of  nondetermination,  (9.12) 

and 

—  r2  ==  coefficient  of  alienation.  (9.13) 


Example  9.3 

For  the  Schopper-Riegler  data,  7*2  =  0.9698  and  1  -r2  =  0.0302.  Thus, 
96.98  per  cent  of  the  variation  in  Y  (Schopper-Riegler  rating)  is  "ex 
plained  by"  the  linear  regression  of  Y  on  X  (hours  of  beating). 

9.5     SAMPLING   FROM   A   BIVARIATE  NORMAL  POPULA 
TION 

The  interpretation  of  r  given  in  the  preceding  section  is  valid  for  any 
simple  linear  regression  regardless  of  what  assumptions  are  made  con 
cerning  the  variables  X  and  Y.  However,  if  a  random  sample  is  drawn 
from  a  bivariate  normal  population,  then  r  [defined  by  Equation  (9.9)  J 
is  a  sample  estimate  of  the  population  parameter 


p  « 


(9.14) 


The  reader  should  note  that  this  is  the  same  correlation  coefficient 
specified  by  Definition  3.31,  and  thus  it  is  not  surprising  that  r  is  some 
times  referred  to  as  the  sample  product-moment  correlation  coeffic^ent. 

When  sampling  from  a  bivariate  normal  population,  it  is  natural  to 
want  to  test  hypotheses  about  the  true  value  of  p.  Since  such  tests  are 
simply  further  examples  of  the  general  techniques  introduced  in  Chap 


ter  7,  only  a  brief  explanation  will  be  given. 

To  test  H:p  =  Q  versus  the  alternative  A:p?*Q,  we  calculate 


(9.15) 
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and  reject  H  if  t>t a~.«/2)  cn— 2)  or  if  £<  —  Z(i_a/2)  0-2).  However,  a  mini 
mum  amount  of  simple  algebra  will  show  that 

t  =  —  =  —  >  (9.16) 

*r  *6l 

and  thus  the  test  just  detailed  is  identically  equivalent  to  the  test  of 
.ff:/3i  =  0  versus  A:/3i^O  as  given  in  Section  8.13.  A  review  of  that 
section  will  remind  you  that  the  hypothesis  might  also  be  tested  using 
an  J^-ratio  [see  Equation  (8.43)].  Consequently,  three  equivalent 
methods  of  testing  are  available,  the  choice  being  determined  by  the 
form  of  the  analysis. 

Example  9.4 

Given  the  sample  observations 


X 


11 


F 

it  is  easily  verified  that  r—  —0.98.  Using  Equation  (9.15),  we  obtain 
£=(  —  0.98)  V3/VO.  0413=  —8.49.  Since  t=  —8.49  <  —£.995(3)  =  —5.841, 
the  hypothesis  7/:p  —  0  is  rejected  in  favor  of  the  alternative  A  :p  ^0. 
Clearly,  a  1  per  cent  significance  level  was  used.  It  is  suggested  that 
the  reader  consider  //:/3i==0  versus  A:/3i  -p^O  and  compare  the  resulting 
test  statistic  with  that  computed  above. 


If  the  hypothesis  to  be  tested  is  //:p  =  p0  versus  A:p^p^y  where 
p0r^0,  the  test  procedure  is  more  complicated.  The  complication  arises 
because  (r—  p0)/5r  is  not  distributed  as  "Student's"  t  unless  p0  =  CK 
When  p0?^07  an  approximate  test  is  provided  by 

*r  -  G)[log.  (l  +  r)  -  log,  (1  -  r)] 

-   (1.1513)  [ioglo(l  +  r)  -  loglo(l  -  r)].  (9.17) 

Fisher  (4)  has  shown  that  zr  is  approximately  normally  distributed 
with  mean  #PO  and  variance  0%  —  l/(n  —  3).  The  approximate  test  pro 
cedure  is  to  calculate 

z  -   (*r-  *Pa)/cr,  (9.18) 

and  compare  this  quantity  with  fractiles  of  the  standard  normal  dis 
tribution.  The  hypothesis  //:p  =  p()  would  be  rejected  if 


or  if 

*<  -*(i-*,a>-  (9-20) 

The  research  worker  may  also  be  interested  in  obtaining  a  confidence 
interval  estimate  of  p.  This  may  be  obtained  by  calculating 
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and  then  using  Equation  (9.17)  to  solve  for  rL  and  ru, 

Quite  frequently  the  research  worker  has  several  independent 
samples,  each  randomly  selected  from  a  bivariate  normal  population, 
from  which  estimates  rx,  -  -  -  ,  rk  are  obtained.  If  the  research  worker 
can  accept  the  hypothesis  Hip^—  •  -  -  =  pA,  it  is  permissible  to  obtain 
a  pooled  estimate  of  the  common  population  correlation  coefficient, 
and  this  pooled  estimate  should,  of  course,  be  more  reliable  than  any 
of  the  individual  estimates.  If  calculations  are  carried  out  as  in  Table 
9.1  and  the  observed  chi-square  is  not  judged  significant  at  the  lOOa 
per  cent  significance  level,  a  pooled  estimate  of  p  (corresponding  to  the 
''average  z"}  may  be  found. 

TABLE  9. 1-Cal dilations  for  Testing  the  Hypothesis  p±  =   •  -  -   =:PA(^==3) 


Sample 

Size  of 

Sample 

n  —  3 

r 

z 

(v,  •a'Nrr 

\'U  ^^  O  J  At 

(n  —  3)s2 

A  .  .     . 

102 
102 
102 

99 
99 
99 

.  63245 
.  77459 
.  67082 

.74551 
1.03168 
.81223 

73  .  80549 
102.13632 
80.41077 

55.02273 
105,37200 
65.31204 

B.    . 

C  

Total 

306 

297 

256.35258 
.86314 

225.70677 

Jc                                      /         k 

Average  z  =    23  (^  —  3)#;  /     S  (w»  —  3) 

t—  1                                  '            i—  1 

(Averages)       23  (^  -~  3)zt- 

221.26817 

v2        for  testinc:  //:p,   —     •  •  •    ==  p,.  

4.43860 

9.6      CORRELATION    IN    MULTIPLE   LINEAR   REGRESSION 

When  a  multiple  linear  regression  equation  has  been  fitted  to  a  set  of 
data,  as  in  Section  8.15,  it  is  natural  to  seek  a  measure  of  correlation 
which  reflects  the  "goodness  of  the  fit."  The  correlation  index  defined 
in  Section  9.3  may  be  used  to  give  us  what  we  desire.  Referring  to  Equa 
tion  (8.68),  it  is  seen  that 

sum  of  squares  due  to  regression 


corrected  sum  of  squares 


(9.22) 


This  may  also  be  expressed  as 


x*>y 


Sy' 


(9.23) 
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which  is  analogous  to  the  expression 


as  given,  in  Equation  (9.8).  If  we  calculate  R  =  \/R2,  where  R*  is  defined 
by  Equation  (9.22)  or  Equation  (9.23),  then  R  is  known  as  the  mul 
tiple  correlation  coefficient.  The  significance  of  R  may  be  assessed  by 
the  F-test  specified  in  Equation  (8.76).  No  example  will  be  given  at 
this  time  since  nothing  new  and  different  is  involved.  However,  some 
of  the  problems  at  the  end  of  the  chapter  will  require  the  calculation 
and  interpretation  of  the  coefficient  of  multiple  correlation. 

It  is  also  worth  noting  that  R3  as  defined  by  Equation  (9.22),  may 
be  thought  of  as  a  simple  linear  correlation  between  Y  and   Y  where 


Closely  allied  to  the  topic  of  multiple  correlation  is  that  of  partial 
correlation.  By  partial  correlation  is  meant  the  correlation  between  two 
variables  in  a  multivariable  problem  under  the  restriction  that  any 
common  association  with  the  remaining  variables  (or  some  of  them) 
has  been  "eliminated."  Clearly,  many  partial  correlation  coefficients 
may  be  calculated.  For  example,  a  first  order  partial  correlation  coeffi 
cient  is  one  which  measures  the  degree  of  linear  association  between 
two  variables  after  taking  into  account  their  common  association  with 
a  third  variable.  Symbolically, 

— 

(9  '  25) 


, 
Vl  — 


— 


* 


where  the  subscripts  refer  to  the  three  variables  JTi,  X%,  and  X$.  Here, 
of  course,  r^.z  is  attempting  to  measure  the  correlation  between  -XTi 
and  -XT  2  independent  of  JT3.  It  should  also  be  clear  that  r*v  (i,  j  =  1,  2,  3) 
are  simple  linear  correlation  coefficients  measuring  the  correlation  be 
tween  Xi  and  Xj.  A  second  order  partial  correlation  coefficient  may  be 
illustrated  by 

^12,3    —    ^14.3^24.3 

===  (9.26) 


.  a 

Vl  -  rl 

which  measures  the  correlation  between  .X\  and  X%  independent  of  X$ 
and  X&. 

Before  proceeding  to  another  topic,  it  will  be  worth  digressing  for  a 
moment  to  discuss  a  related  matter  (related  to  partial  correlation, 
that  is)  in  regression.  In  Section  8.15,  the  equation 

&  -  60  +  blXl  +  -  —  +  bkXk  (9,27) 
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was  discussed  for  the  case  k  =  4.  At  that  time,  had  we  so  desired,  it 
would  have  been  appropriate  to  call  attention  to  a  different  system  of 
notation  which  is  sometimes  encountered.  For  fc  =  4,  Equation  (9.27) 
would  appear  as 

Y  =  bQ  +  biXi  +  b2X2  +  bzX*  +  £4X4.  (9.28) 

An  alternative  notation  is 

Y    ==5     bo    +    byi,  234^1    +    &F2.  134^2    +    #^3.  124^3    +    &r4.  123-^4,  (9.29) 


and  in  this  form  the  analogy  with  partial  correlation  is  evident.  Strictly 
speaking,  the  coefficients  should  be  called  partial  regression  coefficients 
where,  for  example,  &rx.234  represents  how  Y  would  vary  per  unit  change 
in  XT.  if  X%,  Xz,  and  X±  were  all  held  fixed.  Thus,  6  y  1.234  (or,  as  we  usually 
denote  it,  61)  gives  only  a  partial  picture  of  what  happens  to  Y  as  Xi 
changes.  Hence  the  adjective  "partial."  It  should  be  clear  that  the  less 
cumbersome  notation  was  used  (at  the  risk  of  not  clearly  defining  the 
meaning)  solely  to  simplify  the  writing  of  the  equations. 

9.7     THE  CORRELATION    RATIO 

Closely  related  to  the  correlation  index  is  a  quantity  known  as  the 
correlation  ratio.  Denoted  by  E*,  it  is  defined  by 


(9.30) 


where  T\-  is  the  mean  of  the  ith  group  consisting  of  n*  observations  and 
7  is  the  mean  of  all  observations.  Expressing  Equation  (9.30)  in  words, 

among  groups  sum  of  squares 

£2    ^    -  ?_^  -  £  -  ^  -  (9.31) 

corrected  sum  of  squares 

where  the  quantity  labeled  "among  groups  sum  of  squares"  is  most 
easily  found  using  the  identity 


(9.32) 

z—  1  ,  i«*l 


where 


(9.33) 
=  total  of  the  observations  in  the  ith  group 
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and 

k 

T  =   ]T  a*  =  nY 

~1  (9.34) 

=  total  of  all  observations. 
It  should  be  clear,  of  course,  that 


(9.35) 
=  total  number  of  observations. 

A  moment's  reflection  will  indicate  that  the  value  of  E2  is  highly 
dependent  on  the  choice  of  groups.  For  example,  if  there  is  only  one 
observation  in  each  group,  the  value  of  ffi  is  unity;  if  all  the  observa 
tions  are  in  one  group,  the  value  of  E*  is  0.  Great  care,  then,  must  be 
exercised  when  grouping  the  observations. 

Another  point  of  interest  is  the  following  :  Once  the  observations  are 
assembled  in  groups,  the  value  of  E2  is  determined  solely  from  the 
values  of  the  "dependent"  variable.  Consequently,  the  "independent" 
variable  need  not  be  a  quantitative  variable.  It  can  be  a  qualitative 
variable.  That  is,  subject  to  the  dangers  implicit  in  the  groxiping,  the 
correlation  ratio  may  be  used  to  measure  the  correlation  between  a 
quantitative  variable  and  a  qualitative  variable. 

Since  grouping  is  so  important,  some  guidance  is  necessary.  One  rule 
of  thumb  is  to  have  three  to  five  groups,  each  containing  a  large  num 
ber  of  observations  (say  100).  Strict  rules  of  procedure  are  hard  to  de 
fine,  but  the  preceding  rule  may  prove  helpful.  Denoting  the  popula 
tion  correlation  ratio  by  ?72,  Woo  (13)  gives  tables  for  use  in  testing  the 
hypothesis  J/:?7  =  0  when  we  are  willing  to  assume  that  the  K»y  are 
normally  and  independently  distributed  (with  common  variance)  in 
each  group. 

Because  the  analysis  of  variance  form  of  presenting  results  is  so  often 
encountered,  it  should  not  be  surprising  to  find  it  helpful  in  the  present 
situation.  Referring  to  Table  9.2,  it  is  seen  that  the  sums  of  squares 
needed  in  Equation  (9.31)  arc  easily  accessible.  (NOTE:  Now  that  the 
opportunity  has  presented  itself,  we  shall  take  a  moment  to  review 
the  symbolism  introduced  in  Section  7.20.  It  seems  almost  unnecessary 
to  remark  that  the  letters  M,  (7,  and  W  in  the  symbols  Mvv,  Gyv,  and 
W^j,  were  chosen  to  stand  for  the  words  "Mean,  Groups,  and  Within," 
respectively.  However,  since  this  abbreviated  method  of  representing 
various  sums  of  squares  will  be  used  extensively  in  later  chapters,  it  is 
a  good  idea  to  become  well  acquainted  with  the  notation  as  early  as 
possible.) 
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TABLE   9.2-Analysis  of  Variance  Associated  With  the 
Calculation  of  a  Correlation  Ratio 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of  Squares 

Mean  Square 

Mean    . 

1 

jj//_     =:  T^/M. 

Af«,/l 

Among  groups  .  . 
Within  groups  .  .  . 

k  —  1 

k 
/   „    \W>i   —    1) 

Gyy  =    23  G?M  -  T*/n 
Wvv  =    Z  ^2  -  Myy  -  Gvv 

<?«,/(*  -  1) 

/           * 

Total 

M 

y-  F2 

^—' 

9.8      BISERIAL   CORRELATION 

A  measure  of  correlation  encountered  frequently  in  such  areas  of 
specialization  as  education,  psychology,  and  public  health  is  the  bi- 
serial  correlation  coefficient.  Only  a  brief  discussion  will  be  given  in  this 
text.  Those  persons  interested  in  more  detail  are  referred  to  McNemar 
(7),  Pearson  (10),  and  Treloar  (12). 

The  biserial  correlation  coefficient,  usually  denoted  by  rb,  is  used 
where  one  variable,  Y,  is  quantitatively  measured  while  the  second 
variable,  X,  is  dichotomized,  that  is,  defined  by  two  groups.  The  as 
sumptions  necessary  for  a  meaningful  interpretation,  of  rb  are  : 

(1)  Y  is  normally  distributed  and  suffers  little  due  to  broad  group 
ing  (if  grouping  is  necessary) . 

(2)  The  true  distribution  underlying  the  dichotomized  variable  X 
should  be  of  normal  form. 

(3)  The  regression  of  Y  on  X  is  linear. 

(4)  The  mean  value  of  Y  in  the  minor,  or  smaller,  category  as 
specified  by  X,  denoted  by  Fi,  is  to  be  on  the  regression  line. 
This  assumption  implies  a  large  number  of  observations  in  the 
minor  segment. 

If  we  define: 

p  =  proportion  of  observations  in  the  major  category 
q  =  proportion  of  observations  in  the  minor  category 
z  =  ordinate  of  the  standard  normal  curve  at  the  point  cutting  off  a 

tail  of  that  distribution  with  area  equal  to  q 
T2  =  mean  of  the  Y  values  in  the  major  category 
SF  =  standard  deviation  of  all  the  Y  values, 
then 


(9.36) 
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and  this  gives  us  a  measure  of  the  degree  of  linear  association  between 
X  and  F. 

It  should  be  mentioned  that,  in  a  manner  analogous  to  the  way  in 
which  we  developed  the  correlation  ratio,  Pearson  (10)  introduced  the 
concept  of  a  biserial  correlation  ratio,  denoted  by  Eb}  which  extends  the 
biserial  correlation  concept  to  cover  any  postulated  regression  function. 
We  shall  not  go  into  detail  here,  but  the  reader  is  referred  to  Pearson 
(10)  and  Treloar  (12)  if  he  is  interested  in  such  problems. 

9.9  TETRACHORIC  CORRELATION 

Another  measure  frequently  encountered  in  some  areas  of  research  is 
the  tetrachoric  correlation  coefficient.  This  is  generally  denoted  by  rt  and 
is  used  to  measure  the  degree  of  linear  association  between  two  vari 
ables,  X  and  Y,  where  both  are  dichotomized  and  the  true  underlying 
distributions  are  assumed  to  be  normal.  That  is,  if  we  have  samples 
from  a  bivariate  normal  population  but  the  measurements  are  not 
available  (we  know  only  to  which  cell  of  a  2X2  contingency  table  each 
observation  belongs),  we  can  obtain  a  measure  of  the  correlation  be 
tween  X  and  F.  It  is  not  feasible  to  present  a  formula  for  rt,  but  refer 
ence  to  McNemar  (7),  Treloar  (12),  and  other  works  will  indicate  cal- 
culational  methods  for  those  interested  in  this  particular  statistic. 

9.10  COEFFICIENT  OF  CONTINGENCY 

Of  some  interest,  also,  is  a  measure  of  the  degree  of  association  be 
tween  two  characteristics  where  our  observational  data  are  classified  in 
an  rXc  contingency  table.  In  Chapter  7  we  gave  a  method  for  testing 
the  hypothesis  that  these  two  characteristics,  or  classifications,  were 
independent  of  one  another.  Suppose,  however,  that  we  are  more  inter 
ested  in  estimating  the  degree  of  association  between  them  than  test 
ing  the  hypothesis  of  independence.  How  may  we  do  thin?  Pearson  (9) 
proposed  for  this  purpose  a  measure  known  as  the  coefficient  of  con 
tingency  defined  by 


C=     ^/—T-1'  C^-37) 

where  x2  is  the  usual 


as  given  in  Chapter  7,  In  the  case  of  a  2X2  table,  this  may  seem  to  be 
analogous  to  a  tetrachoric  correlation  coefficient,  but  the  coefficient  of 
contingency  is  of  wider  generality  because  wo  no  longer  require  the 
assumption,  of  normality  of  the  underlying  distributions.  Any  distri 
bution,  discrete  or  continuous,  is  acceptable.  However,  there  is  a  dis 
advantage  to  this  measxire  of  association;  its  maximum  possible  value 


9.11        RANK    CORRELATION  233 

varies  with  the  number  of  rows  and  columns,  and  thus  two  different 
values  of  C  are  not  directly  comparable  unless  computed  from  tables 
of  the  same  size.  For  further  remarks  on  this  measure,  the  reader  is  re 
ferred  to  McNemar  (7)  and  Treloar  (12). 

9.11       RANK   CORRELATION 

Let  us  now  consider  a  slightly  different  problem  but  one  that  arises 
quite  frequently  in  certain  areas  of  research.  The  problem  is  as  follows: 
n  individuals  are  ranked  from  1  to  n  according  to  some  specified  char 
acteristic  by  m  observers.,  and  we  "wish  to  know  if  the  m  rankings  are 
substantially  in  agreement  with  one  another.  How  may  we  answer  such 
a  query?  Kendall  and  Smith  (6)  have  proposed  a  measure  known  as  the 
coefficient  of  concordance,  W,  for  answering  this  question  which  is 
defined  by 


W  =  -  ,  (9.38) 

m?(w?  —  n) 

where  S  equals  the  sum  of  the  squares  of  the  deviations  of  the  total  of 
the  ranks  assigned  to  each  individual  from  m(n  +  l)/2.  The  quantity 
m(n  +  l)/2  is,  of  course,  the  average  value  of  the  totals  of  the  ranks, 
and  hence  3  is  the  usual  sum  of  squares  of  deviations  from  the  mean. 
W  varies  from  0  to  1,  0  representing  no  community  of  preference,  while 
unity  represents  perfect  agreement.  The  hypothesis  that  the  observers 
have  no  community  of  preference  may  be  tested  using  tables  given  in 
Kendall  (5)  or,  more  simply  (for  n>7),  by  calculating 

(9.39) 


<mn(n  -J-  1) 

which  is  approximately  distributed  as  chi-square  with  v  =  n—  1  degrees 
of  freedom.  If  there  are  "ties"  in  some  of  the  rankings,  it  may  be  neces 
sary  to  modify  our  formulas  somewhat;  if  such  a  case  is  encountered, 
the  researcher  is  referred  to  Kendall  (5). 

If  we  find  W  to  be  significant,  the  next  step  is  to  estimate  the  true 
ranking  of  the  n  individuals.  This  is  done  by  ranking  them  according  to 
the  sum  of  the  ranks  assigned  to  each,  the  one  with  the  smallest  sum 
being  ranked  first,  the  one  with  the  next  smallest  sum  being  ranked 
second,  and  so  on.  If  two  sums  are  equal,  we  rank  these  two  individuals 
by  the  sum  of  the  squares  of  the  ranks  assigned  to  them,  the  one  with 
the  smaller  sum  of  squares  obviously  being  ranked  ahead  of  the  other. 
If  W  is  not  significant,  we  are  not  justified  in  attempting  to  find  an 
"average/'  or  "pooled,"  estimate  of  a  true  ranking,  for  we  are  not  at 
all  certain  that  such  a  true  ranking  even  exists. 

When  m  =  2,  that  is,  when  only  two  rankings  are  available,  a  slightly 
different  approach  is  often  used.  In  this  case,  a  measure  known  as 
Spearman's  rank  correlation  coefficient  is  computed.  Spearman's  rank 
correlation  coefficient,  denoted  by  r89  is  defined  by 
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!_ 


n 


3    


n 


(9.40) 


where  c^  equals  the  difference  between  the  two  ranks  assigned  to  the  ith 
individual.  It  can  easily  be  seen  that  rs  varies  from  —  1  to  +1,  whereas 
W  varied  only  from  0  to  1,  —  1  signifying  perfect  disagreement  and 
+  1  signifying  perfect  agreement  between  the  two  rankings.  A  test  of 
the  null  hypothesis  H  :  ps  —  0  may  be  made  using  tables  provided  by 
Olds  (8).  We  must  remember,  however,  that  the  same  conclusion, 
namely,  to  accept  or  reject  H,  could  be  reached  by  computing  W  and 
comparing  with  the  tabulated  values  for  m  =  2.  Incidentally,  we  should 
remark  that  Kendall  (5)  does  not  tabulate  W  itself  but  only  the  associ 
ated  value  of  S.  This,  of  course,  cuts  down  the  amount  of  arithmetic 
required  since  it  is  not  necessary  actually  to  compute  the  value  of  W 
in  order  to  perform  our  statistical  test.  Similarly,  Olds  (8)  only  tabu 
lates 

±4. 

t—1 

Example  9.5 

Consider  the  data  of  Table  9.3.  Calculations  yield 


0.771     with     £)  d 

t—  1 


8. 


Using  01  =  0.05,  the  hypothesis  H:ps  =  Q  is  rejected.  [NOTE:  This  con 
clusion  was  reached  after  consulting  the  tables  provided  by  Olds  (8).] 
Thus,  it  is  concluded  that  the  two  judges  are  in.  quite  good  agreement. 


TABLE  9.3~Preferences  for  Six  Lemonades  as  Expressed  by  Two  Judges 


Lemonade 

Ranking  Given 
by  Judge  No.  1 

Ranking  Given 
by  Judge  No.  2 

Difference  in 
Ranks  =  d 

A    

4 

4 

0 

B  

1 

2 

—  1 

c  

6 

5 

1 

D  

5 

6 

—  1 

JS  

3 

1 

2 

F  

2 

3 

—  .  i 

Example  9.6 

Consider  the  data  of  Table  9.4.  It  may  be  verified  that  m(n+l)/2 
=  10.5,  S  ==25, 5,  and  W  =  0.162.  Examination  of  the  tables  in  Kendall 
(5)  leads  us  to  accept  the  hypothesis  of  no  community  of  preference 
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TABLE  9.4— Preferences  for  Six  Lemonades  as  Expressed  by  Three  Judges 


Lemonade 

Ranking 
Given  by 
Judge  No.  1 

Ranking 
Given  by 
Judge  No.  2 

Ranking 
Given  by 
Judge  No.  3 

Sum  of  Ranks 

A  

5 

2 

4 

11 

B  

4 

3 

1 

8 

C  

1 

1 

6 

8 

D 

6 

5 

3 

14 

E  

3 

6 

2 

11 

F  

2 

4 

5 

11 

among  our  three  judges,  and  thus  we  shall  not  attempt  to  estimate 
any  "true  order  of  preference." 

9.12      INTRACLASS  CORRELATION 

a> 

The  measure  of  correlation  to  be  discussed  in  this  section  was  devised 
to  assess  the  degree  of  association  (or  similarity)  among  individuals 
within  classes  or  groups.  For  this  reason,  the  measure  is  known  as  the 
intraclass  correlation  coefficient.  (NOTE :  Some  authors  have  referred  to 
the  intraclass  correlation  coefficient  as  the  coefficient  of  homotypic  corre 
lation  but  the  former  term  is  more  common.) 

As  an  example  of  a  situation  in  which  the  intraclass  correlation  co 
efficient  is  the  proper  measure,  consider  the  problem  of  measuring  the 
correlation  between  heights  of  brothers.  Because  all  that  is  desired  is  a 
measure  of  similarity  between  heights  of  brothers,  any  attempt  to 
label  one  as  X  and  the  other  Y  (for  example,  by  age)  would  introduce 

TABLE  9.5-Symbolic  Representation  of  Data  To  Be  Used  in  Calculating 
the  Intraclass  Correlation  Coefficient 


Groups 

1 

2 

k 

Fu 
F12 

F21 
F22 

Ykl 
Ffc2 

Observations* 


Fln 

F2n 

Ykn 

Total 

Gf*                                                                                            S~** 
\                                                         ^-'"2                                                          ^fc 

*  Each  observation  is  assumed  to  be  of  the  form  F#  — ^u+ £i+€t-j  where  M  is  a  constant,  g^ 
is  a  random  variable  with  mean  0  and  variance  <r#,  and  eif  is  a  random  variable  with 
mean  0  and  variance  <r2.  That  is,  a  linear  model  has  been  postulated  which  states  that  any 
observation  is  a  linear  combination  of  three  contributing  factors:  an  over-all  mean  effect, 
an  effect  due  to  the  particular  group  to  which  the  observation  belongs,  and  an  "error" 
effect  representing  all  extraneous  sources  of  variation. 
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TABLE  9.6—  General  Analysis  of  Variance  for  Calculating  the  Intraclass 
Correlation  Coefficient  Using  the  Data  of  Table  9,5 


Source 
of 

Variation 

Degrees 
of 
Freedom 

Sum 
of 
Squares 

Mean 
Square 

Expected 
Mean 
Square 

Mean  

1 

M-uu 

Among  groups  

k—1 

Cr,rt/ 

s^-^-yiSQ 

(T2Jirncr^f 

Within  groups 

&O—  1) 

Ww 

$* 

<r2 

Total 

kn 

Y\  Y2 

a  spurious  element  into  the  correlation.  The  spurious  element,  of 
course,  would  be  that  an  ordinary  (simple  linear)  correlation  would 
measure  the  correlation  between  the  heights  of  older  brothers  and  the 
heights  of  younger  brothers  rather  than  simply  assess  the  "sameness" 
of  heights  of  brothers. 

The  intraclass  correlation  coefficient,  denoted  by  rI}  is  most  easily 
calculated  using  analysis  of  variance  techniques.  Given  the  data  of 
Table  9.5,  the  variation  among  the  kn  observations  may  be  summarized 
as  in  Table  9.6,  where 


T  - 


+  G2  + 


G/n  — 


(9.41) 
(9.42) 

(9.43) 


and 


R  -  Mvv  —  Guv.  (9.44) 

Since  the  population  intraclass  correlation  coefficient  is  defined  by 


'G 


(9.45) 


a  sample  estimate  is  provided  by 

2 


+  4 

MS  a  —  MSV 


where 


MS  a  +  (n  — 
MSa  =  mean  square  among  groups 


(9.46) 
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=  s*  +  ns%  (9.47) 

=  Gvv/(]k  -  1) 
and 

MSW  =  mean  square  within  groups 

=  s*  (9.48) 

=  Wv3f/k(n  —  1). 

It  will  be  seen  that  if  n  =  2;  the  analysis  would  fit  the  situation  described 
earlier,  namely,  the  correlation  between  the  heights  of  brothers. 
(NOTE:  Once  again  we  have  availed  ourselves  of  the  opportunity  to 
introduce  some  new  notation.  This  time  the  concept  of  components  of 
variance,  denoted  by  s*  and  g2^,  has  been  used  as  an  alternative  way  of 
expressing  mean  squares.  The  relationship  between  "expected  mean 
squares'7  and  "mean  squares"  is,  of  course,  simply  the  familiar  rela 
tionship  between  "population  parameters"  and  "sample  statistics." 
The  determination  of  the  form  of  the  various  expected  mean  squares 
will  be  examined  in  detail  in  succeeding  chapters,  where  linear  models 
will  be  the  main  topic  of  discussion.  Those  who  desire  more  informa 
tion  on  this  topic  may  jump  ahead  to  the  appropriate  sections.) 

If  one  is  willing  to  assume  that  the  individuals  within  groups  are 
random  samples  from  normal  populations  (one  population  per  group) 
and  that  each  population  has  the  same  variance,  then  the  hypothesis 
H:pr  =  0  is  equivalent  to  the  hypothesis  .ff:  0-^  =  0,  and  this  may  be 
tested  using 

F  =  MS  a/ MS  w  (9.49) 

with  degrees  of  freedom  i>i  =  fc— 1  and  V2  =  k(n—  1). 

Example  9.7 

Given  the  data  in  Table  9.7,  calculations  will  lead  to  the  analysis  of 
variance  shown  in  Table  9.8.  From  this  we  obtain  r7  =  0.6974.  To  test 
/frpj^O,  we  calculate  F  =  30.857/5. 500  =  5. 61  with  z>i  =  7  and  i>2=*8 

TABLE  9.7-Heights  of  Eight  Pairs  of  Brothers 

rfeights 
Pair  (inches') 

A 71;  71 

B 69;  72 

C 59;  65 

D 65;  64 

E 66;  60 

P 73;  72 

G 68;  67 

H 70;  68 
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TABLE  9.8-Analysis  of  Variance  for  Data  of  Table  9.7 


Expected 

Degrees  of 

Sum  of 

Mean 

Mean 

Source  of  Variation 

Freedom 

Squares 

Square 

Square 

Mean.    . 

1 

67.5 

67.5 

Among  groups  

7 

216.0 

30.857 

o-2+2<r5 

(Among  pairs  of 

brothers) 

Within  groups  

8 

44.0 

5.500 

<r2 

(Between  brothers 

within  pairs) 

Total 

16 

327.5 

degrees    of    freedom.     Since    F  =  5.61  >FQ.  95^8)  =  3.5,     the     hypothesis 
Hipz  =  Q  is  rejected. 

9.13      CORRELATIONS  OF  SUMS  AND    DIFFERENCES 

Reference  to  Section  5.14  reminds  us  that,  for  any  constants  a»  and 
any  variables  Xt,  the  linear  combination  specified  by 

U  =  22  a^Xi  (9.50) 

has 

vv  =  E[U]   =  jb  <*tf**  (9.51) 

*— i 

and 

where  /*,•  is  the  mean  of  .XT*-,  of  IKS  the  variance  of  X*,  arid  &*$  is  the 
co variance  of  X*  and  JSTj.  .Thus,  if  C7  — JSTi±JXr2, 

Mt/  =  MI  ±  ^  (9.53) 

and 

2  .-  _2  j_  -.2  4.  7<r  ^Q   ^4^1 

"  rr  V  i         [         C/  o      -.1—      **\J  -i  o  •  V,  ^  *  OTPy 

Utilizing  Definition  (3.31),  it  is  easily  verified  that  Equation  (9.54) 
may  be  rewritten  as 

^  —  ^2  ^  ^.2  ±  2p12<rIo-2.  (9,55) 

Rearranging  terms,  we  obtain 
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0-2     0.2    0-2 

P12  =  —-—^ ~  I  U  =  Xi  +  X*  (9.56) 

or 

2       l  2     _-2 

^  1  2 U    .  T-T-    -y  yr  XQ       c^\ 

P12    — ,  U    —    -A.1   —    ^-2-  ^y-->/>' 

This  leads  to  an  alternative  method  of  obtaining  r12  (the  sample  esti 
mate  of  P12);r  namely: 

TT    -\r      _i       V*  /^O      c:Q^ 

I  c/  ==  j\.  i  — }—  -A.  2  y^y .  ooj 


or 

o2    _|_     ?2  _    C2 

*t    +     *2  ^      .  j,    _    ^    _    ^  (Q     59) 


Before  terminating  our  discussion  of  the  correlation  of  sums  and  dif 
ferences,  attention  must  be  directed  to  the  relationship  between  the 
contents  of  this  section  and  the  "method  of  paired  observations" 
examined  in  Sections  6.9  and  7.9.  Noting  thatZ)  =  JX~  —  Y  is  analogous 
to  U  =  X±  —  X^  we  recognize  that  a  legitimate  pairing  of  related  ob 
servations  will  yield  a  smaller  standard  error  of  the  mean  difference  if 
a  positive  correlation  exists.  Such  a  reduction  in  the  standard  error 
represents  a  gain  in  efficiency  (relative  to  nonpairing)  which  will  be 
reflected  in  a  shorter  confidence  interval,  an  easier  establishment  of 
statistical  significance,  or  a  smaller  sample  size.  Clearly,  the  success  of 
pairing  in  any  situation  depends  upon  the  extent  to  which  the  re 
searcher  can  introduce  positive  correlation  into  an  experiment. 

Problems 

9.1  Using  the  data  of  Example  9.4,  test  H:/3i  =  0  versus  A  :/3i  ^0  using: 
(a)  a  Z-tcst,  (b)  an  ^-test.  In  both  tests,  let  <*  =  0.01. 

9.2  If  U  =  a  +  bX  and  V  =  c  +  dY,  show  that  ruv  —  rXY- 

9.3  Verify  Equation  (9.16). 

9.4  Interpret  a  simple  linear  correlation  coefficient  of  —0.8. 

9.5  If  the  simple  linear  (product-moment)  correlation  coefficient  between 
X  and  F  is  rjsrr  =  0.8,  what  are  the  values  of: 

(a)   rxv,  (6)   rxf  ,          and  (c)  ry$l 

9.6  Using  the  data  of  Problem  4.3  and  the  results  of  Problem  8.12,  com 
pute  and  interpret  the  appropriate  measure  of  correlation. 

9.7  Using  the  data  of  the  problem  indicated,  compute  and  interpret  the 
appropriate  measure  of  correlation: 

(a)  8.4  (e)  8.8  00    8.14  (m)8.20 

(6)  8.5  (/)  8.9  (J)    8.15  (n)   8.21 

(c)  8.6  (g)  8.11  (/e)    8.16  (o)    8.22 

(d)  8.7  (A)  8.13  (0    8.17  (p)  8.24 
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9.8 


(?) 

(r) 
0) 

8.26 
8.27 
8.28 

(0    8.29                    O)  8.32 
O)  8.30                    <X>   8.33 
(v)    8.31                     (?)   8.35 

The  following 
F,  selected  at 

table  gives  hypothetical  data  for 
random  from  a  bivariate  normal 

the  covariates  X  and 
distribution. 

X 

F 

X 

Y 

12 

74 

18 

149 

20 

170 

16 

142 

17 

147 

13 

144 

11 

75 

18 

173 

8 

46 

11 

101 

8 

59 

16 

140 

4 

20 

15 

132 

12 

90 

5 

35 

9 

74 

14 

96 

12 

77 

6 

50 

16 

144 

3 

24 

11 

110 

5 

26 

10 

99 

8 

95 

13 

109 

6 

73 

15 

109 

17 

159 

(a) 
(6) 

(c) 


(d) 


Compute  the  means,  the  standard  deviations,  and  the  standard 

errors  of  the  means  of  X  and  F. 

Make  a  scatter  diagram  to  show  the  relation  between  these  two 

series.  Also,  draw  one  line  through  the  plotted  data  showing  the 

mean  of  X  and  another  showing  the  mean  of  F, 

Fit  a  straight  line  to  the  points  on  the  scatter  diagram  in  order  to 

express  mathematically  the  average  relationship  between  these 

two  variables.  The  required  equation  is  f^  —  bo  +  hiX.  This  calls  for 

the  computation  of: 

/   x*v 
(1)    the  regression  coefficient  bi  »  --  > 


—  3T).  Find 


(2)    the  F-intercept  60  —  7  — 

The  regression  equation  may  be  written 

6Q  and  61  geometrically  from  the  graph. 

Calculate  the  estimated  value  of  F  for  each  of  the  30  values  of  X 

from  the  equation  P"~&o  +  &i-Y.  Also,  compute  the  errors  of  esti 

mate  (F-F)  for  each  X. 

Interpret  the  constants  50  and  &i  obtained  for  !?"=*=  bo+hiX. 

Compute  and  interpret  the  standard  error  of  estimate  from  the 

formula 


/j\  (y  - 

-y—^ 


(00    Compute  the  sum  of  squares  of  the  errors  of  estimate  (deviations 
from  regression)  with  the  formula 
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(70    Test  the  regression  coefficient,  61,  for  significance. 

(i)     Compute  the  correlation  coefficient  using  the  formula 


(f)    Compute  and  interpret  the  coefficient  of  determination,  r2. 

(A?)    Partition  53  2/2  into  two  parts:  that  associated  with  regression,  and 

that  attributed  to  errors  of  estimate. 

(Z)     Compute  the  correlation  coefficient  between  X  and  F. 
(m)  Compute  the  correlation  coefficient  between  Y  and  Y. 
(ri)   Compute  the  correlation  coefficient  between  x  and  y. 
(o)    Compute  and  interpret  the  95  per  cent  confidence  limits  of  /3i. 
(p)   Compute  the  standard  errors  for  the  estimated  values  of   Y  for 

each  of  the  following: 

(1)  the  mean  of  all  F's  whose  X  value  is  equal  to  10. 

(2)  particular  Y's  whose  X  value  is  equal  to  10. 

(q)    Compute  the  sum  of  squares  attributed  to  regression  using  thefor- 

mula  52  (F—  F)2.  The  short-cut  formula  is  (2>2/W2>2>  or  r^y*. 

Show  computationally  that  the  three  formulas  give  the  same  sum 

of  squares. 

(r)    Show  computationally  that  (1  —  r^^^  =  2Z(Y—  F)2. 
(s)    Compute  the  regression  of  X  on  F;  that  is,  compute  the  constants 

in  the  equation  X  =  b'0  +  biY,  where 

and     ft'o  -  3T 

Plot  the  regression  on  the  same  sheet  on  which  the  regression 

^-^^Q+biX  was  plotted. 
(0     Show  that  r2  =  Z>i&I,  where  61  and  bi  are  the  two  regression  coeffi 

cients. 
(u*)   Compute 


(v)    Show  logically,  algebraically,  or  geometrically  that  |  r  \  cannot  be 

less  than  0  nor  greater  than  1. 
9.9        We  have  this  sample  of  X  and  Y  values: 


9 

4 

11 

2 

7 

5 

10 

1 

8 

3 
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9.10 


9,11 


9.12 


9.13 


(a)    Compute  the  product-moment  correlation  between   Y  and  X  for 

this  sample. 
(&)    What  assumptions  are  required  for  testing  the  significance  of  a 

sample  value  of  rl  What  parameter  is  estimated  by  the  sample 

correlation? 

Indicate  or  describe  three  methods  for  testing  the  hypothesis  that 

the  true  value  of  the  correlation  is  0  in  the  bivariate  population 

from  which  the  above  sample  was  taken.  (Exact  formulas  are  not 

required.) 

Management  seeks  to  discover  a  measure  of  correlation  between  length 
of  service  on  the  part  of  a  certain  type  of  machine  and  the  annual  re 
pair  bills  on  such  machines.  From  the  following  data: 


(c) 


Machine 

Years  of 
Service 

Annual  Repair 
Cost 

A  

1 

$2.00 

B 

3 

1  .50 

C  .,  . 

4 

2,50 

D  

2 

2.00 

&  

5 

3.00 

F.    . 

8 

4.00 

o. 

9 

4.00 

H  

10 

5.00 

/.  .  . 

13 

8.00 

J  

15 

8.00 

(a)   Make  a  scatter  diagram,   designating  years  of  service  as  the  X 
scries  and  annual  repair  costs  as  the  Y  series. 


(6) 
(c) 


Find  the  correlation  coefficient  r. 
Is  the  measure  of  correlation  significant? 
What  are  your  assumptions? 
Given  that 

22  yj  «=  1000 

and  that  the  Rum  of  squares  due  to  regression  is  640,  compute  the  value 
of  r  showing  all  your  steps.  What  assumptions  are  necessary  if  r  is  to  be 
interpreted  as  a  sample  estimate  of  a  population  correlation  coefficient? 
The  correlation  coefficient  between  the  C.A.V.D.  Vocabulary  and  the 
Graduate  Record  Verbal  tests  was  0.60  for  a  sample  of  67  men  students 
and  0.50  for  a  sample  of  39  women  students.  With  a  risk  of  Type  1  error 
of  5  per  cent,  is  this  evidence  that  the  two  groups  are  random  samples 
from  bivariate  normal  populations  of  the  same  correlation? 
Given  the  following  data  and  statistics  for  a  random  sample  from  a 
bivariate  normal  distribution; 


3? 

7 


6 

20 
22 


100 

2500 

-400 


-4 
—0.8 
44 


6.708 
0.6708 


(a)    Give  a  detailed  interpretation  of  the  linear  regression  of  Y  on  X. 
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Include  all  inferences  that  can  be  made  about  the  population 

regression.  Also,  interpret  all  inferences  made. 
(&)    Interpret  the  above  correlation  coefficient. 

(c)    What  assumptions  are  implicit  in  the  use  of  the  regression  in  (a)  ? 
9.14     Using  the  results  below,  test  the  hypothesis  H:pi=p(i  =  lj  •  •  -  ,  7). 
Also  run  through  the  series  of  tests  outlined  in  Section  8.26.  State  the 
assumptions  made  in  each  case. 

REGRESSION  AND  CORRELATION  DATA  IN  SEVEN  TYPES  or  SHEETING 


Fabrics 

Degrees 
of 
Free 
dom 

Z*' 

Sary 

S;y2 

Correla 
tion 
Coefficient 

Regression 
Coefficient 

Degrees 
of 

Free 
dom 

Sum  of 
Squares 

Mean 
Square 

1 

139 
139 
139 
139 
139 
139 
139 

60357.14 
60357.14 
60357.14 
60357.14 
60357.14 
60357.14 
60357.14 

—  989  .  64 
—  1970.43 
—  1647.50 
—  192.86 
—  5482.14 
—  7605.00 
—  12458.50 

1965  .  89 
2351.43 
3190,85 
3258.61 
2804.04 
2276.79 
4375.60 

—0.0909 
—0.1654 
—0.1186 
—0.0138 
—  0.4214 
—0.6487 
—0.7666 

—0.0164 
—0.0326 
—0.0273 
—0.0032 
-0.0908 
—0.1260 
—0.2064 

138 
138 
138 
138 
138 
138 
138 

1949.66 
2287.10 
3145.88 
3257.99 
2306.11 
1318.56 
1804.00 

2  

3  
4 

7  

Total 

973 

422499.98 

—30346.07 

20223.21 

-0.5028 

966 

16069,30 

16.63 

972 

20201.41 

Difference  for  testing  among  regression  coefficients 
^  =  688.68/16.63=341.41 

6 

4132.11 

688.68 
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CHAPTER    10 

DESIGN  OF  EXPERIMENTAL 
INVESTIGATIONS 

BEFORE  PROCEEDING  to  the  introduction  and  discussion  of  further 
techniques  of  statistical  analysis,  time  will  be  taken  to  examine  certain 
aspects  of  data  acquisition.  Such  a  digression,  if  it  really  is  a  digres 
sion,  is  justified  because  the  analysis  of  any  set  of  data  is  dictated  (to  a 
large  extent)  by  the  manner  in  which  the  data  were  obtained.  The  truth  of 
the  foregoing  statement  will  be  illustrated  many  times  throughout  the 
remainder  of  this  book. 

10.1  SOME  GENERAL   REMARKS 

It  has  been  well  demonstrated  in  the  preceding  chapters  that  sta 
tistics  (as  a  science)  deals  with  the  development  and  application  of 
methods  and  techniques  for  the  collection,  tabulation,  analysis,  and 
interpretation  of  data  so  that  the  uncertainty  of  conclusions  based 
upon  the  data  may  be  evaluated  by  means  of  the  mathematics  of 
probability.  However,  it  should  also  be  evident  that  there  is  some 
thing  more  to  statistics  than  the  routine  analysis  of  data  using  stand 
ard  techniques,  For  example,  the  reader  should  realize  that  the  anal 
yses  are  exact  only  if  all  the  underlying  assumptions  are  satisfied.  Since 
this  is  rarely  true,  much  depends  on  the  skill  of  the  researcher  in  select 
ing  the  method  of  analysis  which  best  fits  the  circumstances  of  the 
experimental  situation  being  studied.  Thus,  it  seems  safe  to  say  that 
statistics  is  an  art  as  well  as  a  science. 

10.2  WHAT  IS  MEANT  BY  "THE  DESIGN  OF  AN  EXPERI 
MENT"? 

Designing  an  experiment  simply  means  planning  an  experiment  so 
that  information  will  be  collected  which  is  relevant  to  the  problem 
under  investigation.  All  too  often  data  are  collected  which  turn  out  to 
be  of  little  or  no  value  in  any  attempted  solution  of  the  problem.  The 
design  of  an  experiment  is,  then,  the  complete  sequence  of  steps  taken 
ahead  of  time  to  insure  that  the  appropriate  data  will  be  obtained  in  a 
way  which  permits  an  objective  analysis  leading  to  valid  inferences 
with  respect  to  the  stated  problem.  Such  a  definition  of  designing  an 
experiment  implies,  of  course,  that  the  person  formulating  the  design 
clearly  understands  the  objectives  of  the  proposed  investigation. 

10.3  THE  NEED   FOR  AN    EXPERIMENTAL   DESIGN 

That  some  sort  of  design  is  necessary  before  any  experiment  is  per 
formed  may  be  demonstrated  by  considering  an  example. 

[244] 
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Example  10.1 

It  is  desired  to  determine  the  effect  of  gasoline  and  oil  additives  on 
carbon  and  gum  formation  of  engines.1  Twenty  additives  are  to  be 
tested  in  combination  with  a  "control"  gasoline  and  oil  mixture.  Eighty 
similar  engines  are  available  for  use  in  the  experimental  program. 

As  the  problem  is  now  stated  it  is  far  too  general  to  permit  the  selec 
tion,  of  a  particular  design.  Many  questions  must  be  asked  (and  answers 
obtained)  before  the  statistician  can  propose  a  suitable  design.  Typical 
questions  are: 

(1)  How  is  the  effect  to  be  measured?  That  is,  what  are  the  char 
acteristics  to  be  analyzed? 

(2)  What  factors  influence  the  characteristics  to  be  analyzed? 

(3)  Which  of  these  factors  will  be  studied  in  this  investigation? 

(4)  How  many  times  should  the  basic  experiment  be  performed? 

(5)  What  should  be  the  form  of  the  analysis? 

(6)  How  large  an  effect  will  be  considered  important? 

When  we  recognize  that  the  foregoing  questions  are  only  a  small  sample 
of  those  that  might  be  asked,  it  is  evident  that  much  thought  should  be 
given  to  the  planning  stage  in  any  experimental  investigation.  In  fact, 
the  importance  of  thus  recommendation  cannot  be  overemphasized. 

10.4     THE   PURPOSE  OF  AN    EXPERIMENTAL   DESIGN 

The  purpose  of  any  experimental  design  is  to  provide  a  maximum 
amount  of  information  relevant  to  the  problem  under  investigation. 
However,  it  is  also  important  that  the  design,  or  plan,  or  test  program, 
be  kept  as  simple  as  possible.  Further,  the  investigation  should  be  con 
ducted  as  efficiently  as  possible.  That  is,  every  effort  should  be  made 
to  conserve  time,  money,  personnel,  and  experimental  material.  For 
tunately,  most  of  the  simple  statistical  designs  are  not  only  easy  to 
analyze  but  also  are  efficient  in  both,  the  economic  and  statistical 
senses.  For  this  reason,  a  statistician  should  be  consulted  in  the  early 
stages  of  any  proposed  research  project.  He  can  often  recommend  a 
simple  design  which  is  both  economical  and  efficient. 

Having  said  that  the  purpose  of  any  experimental  design  is  to  pro 
vide  a  maximum  amount  of  information  at  minimum  cost,  it  is  evident 
that  the  design  of  experiments  is  a  subject  which  involves  both  sta 
tistical  methodology  and  economic  analysis.  A  person  planning  an  ex 
periment  should  incorporate  both  of  these  features  into  his  design. 
That  is,  he  should  strive  for  statistical  efficiency  and  resource  economy. 
However,  an  examination  of  books  on  statistical  methods  and  the 
design  of  experiments  will  seldom  reveal  many  explicit  references  to 
the  cost  aspects  of  the  problem.  This  is  unfortunate.  On  the  other  hand, 
the  subject  of  cost  is  implicit  in  most  discussions  of  experimental  design. 
We  have  only  to  note  the  continual  attempts  to  plan  experiments  using 

1  Projects  and  Publications  of  the  National  Applied  Mathematics  Laboratories, 
April  through  June,  1949,  p.  79. 
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the  smallest  size  sample  possible,  to  realize  that  the  cost  aspect  has  not 
been  overlooked.  Fortunately,  as  we  have  already  observed,  most 
simple  designs  are  both  economical  and  efficient,  and  thus  the  statis 
tician's  efforts  to  achieve  statistical  efficiency  usually  also  lead  to 
economy  of  experimentation. 

10.5  BASIC   PRINCIPLES  OF   EXPERIMENTAL   DESIGN 

It  has  been  stated  many  times  that  there  are  three  basic  principles 
of  experimental  design:  replication,  randomization,  and  local  control. 
Because  of  the  fundamental  nature  of  these  concepts,  each  will  be  dis 
cussed  separately.  Further,  it  is  recommended  that  the  reader  strive  for 
as  complete  an  understanding  and  appreciation  of  these  ideas  as  pos 
sible,  for  they  will  play  a  very  important  role  in  much  of  the  remainder 
of  this  book. 

10.6  REPLICATION 

By  replication  we  mean  the  repetition  of  the  basic  experiment.  The 
reasons  why  replication  is  desirable  are:  (1)  It  provides  an  estimate  of 
experimental  error  which  acts  as  a  "basic  unit  of  measurement"  for 
assessing  the  significance  of  observed  differences  or  for  determining  the 
length  of  a  confidence  interval.  (2)  Since,  under  certain,  assumptions, 
experimental  error  may  be  estimated  in  the  absence  of  replication,  it  is 
also  fair  to  state  that  replication  sometimes  provides  a  more  accurate 
estimate  of  experimental  error.  (3)  It  enables  us  to  obtain  a  more  pre 
cise  estimate  of  the  mean  effect  of  any  factor  since  cr-?  =  cr*/n.  (In  the 
formula  just  quoted,  o-2  represents  the  true  experimental  error  and  n 
the  number  of  replications.) 

It  must  be  emphasized  that  multiple  readings  do  not  necessarily 
represent  true  replication.  This  statement  may  best  be  substantiated 
by  an  example. 

Example  10.2 

Two  manufacturing  processes  are  used  to  produce  thermal  batteries" 
Sample  batteries  are  obtained  from  each  of  two  production  lots,  one 
lot  being  produced  by  process  A  and  the  other  by  process  B.  The 
batteries  are  then  tested  and  the  activated  life  of  each  battery  is 
recorded. 

If  an  analysis  of  the  above  experiment  were  attempted,  it  would  be 
discovered  that  no  valid  estimate  of  error  is  available  for  testing  the 
difference  between  processes.  The  variation  among  batteries  within 
lots  yields  a  valid  estimate  of  error  for  assessing  only  the  lot-to-lot 
variability.  True  replication  would  require  that  batteries  be  tested 
from  each  of  several  lots  manufactured  by  each  process.  (NOTE:  In 
the  example  just  given,  the  effects  of  lots  and  processes  are  said  to  be 
confounded.  This  term  will  be  discussed  more  fully  a  little  later.) 

Sometimes  the  absence  of  true  replication  is  more  easily  recognized 
than  in  Example  10.2.  For  instance,  if  multiple  measurements  of  acti 
vated  life  had  been  obtained  by  connecting  several  clocks  to  a  single 
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battery,  the  researcher  would  easily  have  recognized  that  the  observed 
data  were  not  true  replications  but  only  repeated  measurements  on  the 
same  experimental  unit.  Another  example  of  the  same  type  of  spurious 
replication  (i.e.,  multiple  measurements  rather  than  true  replication) 
would  be  multiple  determinations  of  the  silicon  content  of  a  particular 
batch  of  pig  iron  where  the  variability  among  processes  was  to  be  as 
sessed. 

10.7   EXPERIMENTAL   ERROR   AND   EXPERIMENTAL 
UNITS 

In  the  preceding  discussion  of  replication,  the  terms  experimental 
error  and  experimental  unit  were  used.  Because  of  their  wide  usage,  it 
is  necessary  to  have  a  clear  understanding  of  their  meanings.  An  experi 
mental  unit  is  that  unit  to  which  a  single  treatment  (which  may  be  a 
combination  of  many  factors)  is  applied  in  one  replication  of  the  basic 
experiment.  The  term  experimental  error  describes  the  failure  of  two 
identically  treated  experimental  units  to  yield  identical  results. 

At  the  risk  of  saying  too  much  and  thus  confusing  the  reader,  it  is 
my  belief  that  some  discussion  of  the  preceding  definitions  is  in  order. 
In  one  respect,  the  term  "experimental  error"  is  unfortunate,  especially 
the  word  '  'error. "  This  word  is  probably  a  legacy  from  the  physical  sci 
ences,  particularly  astronomy,  where  the  investigators  (observers)  were 
concerned  with  errors  in  both  measurement  and  observation.  However, 
the  influence  of  experimenters  in  both  the  biological  and  physical  sci 
ences  should  not  be  discounted  entirely.  The  adoption  of  the  word 
"error"  could  just  as  easily  be  attributed  to  them,  for  they  clearly 
recognized  the  existence  of  errors  of  technique  in  the  performance  of 
their  experiments.  But  whatever  the  history  of  the  word  "error/ '  a 
thoughtful  examination  of  the  definition  of  the  terra  "experimental 
error"  will  reveal  that  its  meaning  to  the  statistician  is  much  more 
general.  In  each  particular  situation,  it  reflects:  (1)  errors  of  experi 
mentation,  (2)  errors  of  observation,  (3)  errors  of  measurement, 
(4)  the  variation  of  the  experimental  material  (i.e.,  among  experi 
mental  units),  and  (5)  the  combined  effects  of  all  extraneous  factors 
which  could  influence  the  characteristics  under  study  but  which  have 
not  been  singled  out  for  attention  in  the  current  investigation. 

There  is  another  item  related  to  the  term  experimental  error  which  is 
sometimes  confusing  to  the  statistical  novice.  This  is  the  practice  of  the 
professional  statistician  of  referring  to  "the  experimental  error  for 
testing  a  particular  effect."  Such  a  phrase  suggests  that,  in  a  given  ex 
periment,  there  may  be  more  than  one  experimental  error  even  though 
examination  of  the  assumed  statistical  model  will  reveal  only  one  such 
term.  As  confusing  as  this  practice  may  be  to  the  uninitiated,  it  serves 
a  useful  purpose.  As  the  reader  progresses  through  the  remainder  of  this 
book,  he  will  become  more  familiar  with  the  way  in  which  the  expres 
sion  is  used  and  thus,  I  hope,  become  more  tolerant  of  what  seems  at 
the  moment  to  be  an  unwise  use  of  words  that  have  been  carefully 
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defined.  In  an  attempt  to  give  a  somewhat  more  specific  defence  at  this 
time,  let  me  say  that  all  the  statistician  is  really  doing  is  reminding  you 
of  the  fact  that  every  statistic  has  its  own  standard  error.  Perhaps  his 
choice  of  wrords  is  not  the  best,  but  it  is  a  firmly  entrenched  part  of  the 
language  of  experimental  design.  Thus,  I  strongly  recommend  that  you 
forgive  the  statistician  his  choice  of  words  and  that  you  concentrate  on 
the  more  important  task  of  learning  how  and  when  to  use  statistical 
methods. 

Before  terminating  this  discussion  of  experimental  error,  ways  of 
reducing  its  magnitude  should  be  indicated.  The  following  statements 
are,  of  course,  only  general  recommendations,  for  specific  recommenda 
tions  can  be  made  only  "when  a  particular  design  problem  is  being  con 
sidered.  Experimental  error  may  usually  be  reduced  by  adoption  of 
one  or  more  of  the  following  techniques:  (1)  using  more  homogeneous 
experimental  material  or  by  careful  stratification  of  available  material, 
(2)  utilizing  information  provided  by  related  variates,  (3)  using  more 
care  in  conducting  the  experiment,  (4)  using  a  more  efficient  experi 
mental  design. 

10.8      CONFOUNDING 

In  Section  10.6,,  the  word  "confounded"  was  introduced  to  describe 
a  certain  phenomenon  which  is  fairly  common  in  experimentation. 
Since  this  phenomenon  is  so  important  in  the  design  of  experiments,  it 
is  appropriate  that  time  be  taken  to  investigate  and  describe  it  more 
thoroughly.  This  will  best  be  done  through  the  use  of  examples. 

Example  10.3 

A  chemist  has  developed  a  new  synthetic  fertilizer  and  wishes  to 
compare  it  with  an  established  product.  He  contacts  a  nearby  university 
and  they  agree  to  run  an  experiment  on  two  available  experimental 
plots.  The  established  product  will  be  applied  to  one  plot  of  ground  and 
the  experimental  product  to  the  other.  The  characteristic  to  be  meas 
ured  and  used  as  the  index  of  performance  will  be  the  yield  (converted 
to  bushels  per  acre)  of  a  specified  cereal  crop.  However,  when  the  two 
yields  are  compared,  we  are  unable  to  say  how  much  of  the  difference 
is  due  to  fertilizers  and  how  much  is  due  to  inherent  differences  (in  fer 
tility,  soil  type,  etc.)  between  the  two  plots.  That  is,  any  comparison  of 
fertilizers  is  said  to  be  confounded  with  a  comparison  of  plots  or,  in 
slightly  different  words,  the  effects  of  fertilizers  and  plots  are  con 
founded. 

Example  10.4 

An  analyst  is  engaged  in  determining  the  percentage  of  iron  in  chemi 
cal  compounds.  Two  different  procedures  are  to  be  compared.  The 
analyst  takes  a  sample  of  the  first  chemical  compound  and  makes  a 
determination  of  the  iron  content  using  procedure  A.  Then  he  makes  a 
determination  using  procedure  J5.  This  sequence  (that  is,  first  A  and 
then  jB)  of  steps  is  repeated  several  times,  each  time  on  a  new  sample 
from  a  different  compound.  But  here  again,  as  in  Example  10.3,  we  are 
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troubled  by  the  existence  of  confounding.  Any  comparison  of  the  two 
procedures  (A  and  B)  will  be  confounded  with  a  comparison  of  the  first 
and  second  determinations  made  (on  each  compound)  by  the  analyst. 
That  is,  if  there  is  any  improvement  in  technique  (due  to  a  learning 
process)  from  the  first  to  the  second  determination,  this  effect  will  be 
confounded  with  the  difference  between  procedures. 

Examination  of  the  preceding  examples  will  show  that  the  word  "con 
founded^  is  simply  a  synonym  for  "mixed  together."  That  is,  two  (or 
more)  effects  are  said  to  be  confounded  in  an  experiment  if  it  is  impos 
sible  to  separate  the  effects  when  the  subsequent  statistical  analysis  is 
performed. 

Since  one  of  the  purposes  of  experimental  design  is  to  provide  unam 
biguous  results,  it  would  seem  almost  obvious  that  a  good  design  should 
avoid  confounding.  It  is,  therefore,  disconcerting  to  the  uninitiated  to 
learn  that  the  statistician  frequently  deliberately  introduces  confound 
ing  into  a  design.  However,  as  you  will  see  later,  such  a  procedure  is 
not  followed  indiscriminately.  When  confounding  is  introduced  into  a 
design  it  is  done  so  for  a  good  reason,  and  the  reason,  is,  as  often  as  not, 
to  achieve  economy  through  reduction  of  the  size  of  the  experiment. 

10.9      RANDOMIZATION 

It  was  noted  in  Section  10.6  that  replication  provides  an  estimate  of 
experimental  error  which  can  be  used  for  assessing  the  significance  of 
observed  differences.  That  is,  replication  makes  a  test  of  significance 
possible.  But  what  makes  such  a  test  valid?  We  have  seen  that  every 
test  procedure  has  certain  underlying  assumptions  which  must  be  satis 
fied  if  the  test  is  to  be  valid.  Perhaps  the  most  frequently  invoked  as 
sumption  is  the  one  which  states  that  the  observations  (or  the  errors 
therein)  are  independently  distributed.  How  can  we  be  certain  that  this 
assumption  is  true?  We  cannot,  but  by  insisting  on  a  random  sample 
from  a  population  or  on  a  random  assignment  of  treatments  to  the  ex 
perimental  units,  we  can  proceed  as  though  the  assumption  is  true. 
That  is,  randomization  makes  the  test  valid  by  making  it  appropriate  to 
analyze  the  data  as  though  the  assumption  of  independent  errors  is 
true.  Note  that  we  have  not  said  randomization  guarantees  independ 
ence,  bxit  only  that  randomization  permits  us  to  proceed  as  though 
independence  is  a  fact.  The  reason  for  this  distinction  should  be  clear; 
Errors  associated  with  experimental  units  that  are  adjacent  in  space 
or  time  will  tend  to  be  correlated,  and  all  that  randomization  does  is  to 
assure  us  that  the  effect  of  this  correlation  on  any  comparison  among 
treatments  will  be  made  as  small  as  possible.  Some  degree  of  correlation 
will  still  remain,  for  no  amount  of  randomization  can  ever  eliminate  it 
entirely.  That  is,  in  any  experiment,  true  and  complete  independence  of 
errors  is  an  ideal  that  can  never  be  achieved.  However,  such  independ 
ence  should  be  sought,  and  randomization  is  the  best  technique  de 
vised  so  far  to  attain  the  desired  end. 

Sometimes  the  concept  of  randomization  is  introduced  as  a  device 
for  "eliminating"  bias.  To  illustrate  the  thinking  back  of  this  approach, 
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consider  again  Example  10.4.  There,  any  comparison  of  procedures  A 
and  B  would  be  biased  in  favor  of  B  if  a  learning  effect  existed.  How 
ever,  if  each  time  a  new  compound  "was  to  be  investigated  the  analyst 
had  decided  at  random  which  procedure  to  use  first,  the  bias  would 
have  been  reduced,  perhaps  even  eliminated.  But  even  more  would 
have  been  accomplished.  If  there  were  other  biases  operating,  these 
would  also  have  had  their  effects  eliminated  (or  at  least  reduced)  by 
the  randomization.  That  is,  by  randomly  assigning  treatments  to  the 
experimental  units,  we  try  to  make  certain  that  treatments  will  not  be 
continually  favored  or  handicapped  by  extraneous  sources  of  variation 
over  which  the  experimenter  has  no  control  or  over  which  he  chooses 
not  to  exercise  control.  In  other  words,  randomization  is  like  insurance; 
it  is  always  a  good  idea,  and  sometimes  it  is  even  better  than  we  expect. 

Regardless  of  the  foregoing  arguments  in  favor  of  randomization, 
there  have  been  (in  the  past)  persons  who  have  spoken  out  in  favor  of 
systematic  (nonrandom)  designs.  "Can  we  not,"  they  ask,  "obtain  a 
more  accurate  measurement  of  differences  among  treatments  if  such 
treatments  are  applied  to  the  experimental  units  in  a  systematic  man 
ner?"  The  only  honest  answer  to  this  query  is,  "Possibly."  Why,  then, 
does  the  statistician  insist  on  randomization?  The  reason  is,  of  course, 
the  same  as  expressed  earlier:  It  is  because  the  statistician  wishes  to 
make  certain  inferences  from  the  observed  data  and  he  desires  to  at 
tach  a  measure  of  reliability  to  these  inferences.  If  randomization  is 
not  employed,  the  quoted  measure  of  reliability  may  be  biased.  Fur 
ther,  any  inference  would  be  unsupported  by  a  meaningful  probability 
statement.  (NOTE:  The  reader  is  reminded  of  the  discussion  of  judg 
ment  versus  random  samples  presented  in  Section  4.2.) 

There  are,  of  course,  situations  in  which  complete  randomization  is 
either  impossible  or  uneconomical.  The  statistician  should  not,  there 
fore,  adopt  the  unrelenting  position  of  insisting  on  complete  randomiza 
tion  in  every  case.  On  the  other  hand,  neither  should  he  agree  to  the 
use  of  a  completely  systematic  design,  for  the  experimenter  must 
reconcile  himself  to  the  fact  that  some  degree  of  randomization  is  re 
quired  for  the  valid  application  of  most  statistical  analyses.  Clearly, 
some  intermediate  position  between  the  two  extremes2  of  complete 
randomization  or  a  strictly  systematic  design  is  often  most  realistic. 
Once  the  experimenter  and  the  statistician  recognize  one  another's 
problems,  a  compromise  plan  can  usually  be  found  which  is  mutually 
satisfactory. 

10.10      LOCAL  CONTROL 

In  Section  10.5,  it  was  stated  that  the  three  basic  principles  of  ex 
perimental  design  are  replication,  randomization,  and  local  control. 

2  The  question  of  which  is  better,  a  systematic  or  a  randomized  design,  has 
never  been  completely  settled.  Most  likely  it  never  will  be  settled.  Most  designs 
in  common  use  today  involve  both  systematic  and  random  elements,  and  this 
seems  a  reasonable  state  of  affairs.  For  the  person  who  wishes  to  pursue  this 
point  farther,  the  literature  offers  many  papers  discussing  the  argument,  both 
pro  and  con.  See  references  (2,  27,  35,  36,  44). 


1O.11        BALANCING,    BLOCKING,    AND    GROUPING  251 

The  first  two  of  these  basic  principles  have  already  been  discussed  and 
it  is  now  appropriate  that  time  be  devoted  to  the  third. 

In  one  sense,  local  control  is  synonymous  with  experimental  design. 
However,  this  interpretation  of  experimental  design  is  very  narrow,  and 
not  consistent  with  our  earlier  definition.  If  we  agree,  then,  that  experi 
mental  design  is  as  defined  in  Section  10.2,  then  local  control  is  only  a 
part  of  the  total  complex.  In  this  sense,  local  control  refers  to  the 
amount  of  balancing,  blocking,  and  grouping  of  the  experimental  units 
that  is  employed  in  the  adopted  statistical  design.  It  was  observed 
earlier  (Section  10.9)  that  replication  and  randomization  make  a  valid 
test  of  significance  possible.  What,  then,  is  the  function  of  local  con 
trol?  The  function,  or  purpose,  of  local  control  is  to  make  the  experi 
mental  design  more  efficient.  That  is,  local  control  makes  any  test  of 
significance  more  sensitive  or,  in  the  language  of  Section  7.1,  it  makes 
the  test  procedure  more  powerful.  This  increase  in  efficiency  (or  sensi 
tivity  or  power)  results  because  a  proper  use  of  local  control  will  reduce 
the  magnitude  of  the  estimate  of  experimental  error.  (NOTE:  The 
reader  should  recognize  that  local  control  can  be  exerted  in  several 
ways.  The  more  common  methods  have  been  suggested  above  and  in 
the  last  paragraph  of  Section  10.7.) 

10.11       BALANCING,    BLOCKING,   AND   GROUPING 

In  the  preceding  section,  the  terms  balancing,  blocking,  and  grouping 
were  introduced  in  connection  with  the  principle  of  local  control. 
Rather  than  leave  these  words  undefined,  a  few  sentences  of  explana 
tion  will  be  given  so  that  the  researcher  will  understand  what  is  im 
plied.  Actually,  it  is  possible  to  say  that  the  three  terms  are  synony 
mous.  However,  in  this  text  we  shall  use  them  to  describe  different 
aspects  of  design  philosophy.  It  is  hoped  that  this  will  not  lead  to  con 
fusion  when  other  references  are  consulted. 

By  grouping  will  be  meant  the  placing  of  a  set  of  homogeneous  exper 
imental  units  into  groups  in  order  that  the  different  groups  may  be 
subjected  to  different  treatments.  These  groups  may,  of  course,  con 
sist  of  different  numbers  of  experimental  units. 

Example  10.5 

A  pharmaceutical  company  is  investigating  the  comparative  effects 
of  three  proposed  compounds.  The  experiment  will  consist  of  injecting 
rats  with  the  compounds  and  recording  the  pertinent  reaction.  A  litter 
consisting  of  11  rats  (experimental  units)  is  available.  Each  of  the  11 
rats  is  assigned  at  random  to  one  of  three  groups  subject  only  to  the 
restriction  that  the  three  groups  contain  4,  4,  and  3  rats,  respectively. 
The  animals  in  the  first  group  are  then  injected  with  compound  A, 
those  in  the  second  group  with  compound  B,  and  those  in  the  third 
group  with  compound  C. 

By  blocking  will  be  meant  the  allocation  of  the  experimental  units  to 
blocks  in  such  a  manner  that  the  units  within  a  block  are  relatively 
homogeneous  while  the  greater  part  of  the  predictable  variation  among 
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units  has  been  confounded  with  the  effect  of  blocks.  That  is,  using  the 
researcher's  prior  knowledge  concerning  the  nature  of  the  experimental 
units,  the  statistician  can  design  the  experiment  in  such  a  way  that 
much  of  the  anticipated  variation  will  not  be  a  part  of  experimental 
error.  In  this  way,  a  more  efficient  design  is  provided. 

Example  10.6 

Consider  again  the  problem  outlined  in  Example  10.5.  This  time, 
however,  let  us  assume  that  12  rats  are  available  and  that  the  pedigrees 
show  6  of  them  are  from  litter  X,  3  are  from  litter  F,  and  3  from  litter  Z. 
Since  it  may  well  be  expected  that  rats  in  the  same  litter  will  perform 
more  nearly  alike  than  rats  from  different  litters  (due  to  inherited 
characteristics),  it  would  seem  natural  to  form  three  blocks.  The  first 
block  would  contain  the  6  rats  from  litter  X,  the  second  block  would 
contain  the  3  rats  from  litter  Y,  and  the  third  block  would  contain  the  3 
rats  from  litter  Z.  The  three  treatments  (A,  B7  and  C)  would  then  be 
assigned  at  random  to  the  rats  within  blocks.  Since  each  rat  is  subjected 
to  only  one  treatment,  the  block  containing  6  rats  would  undoubtedly 
end  up  with  2  rats  seeing  treatment  A,  2  seeing  treatment  B,  and  2 
seeing  treatment  C.  The  other  two  blocks  would  have  single  rats  seeing 
each  treatment. 

By  'balancing  will  be  meant  the  obtaining  of  the  experimental  units, 
the  grouping,  the  blocking,  and  the  assignment  of  the  treatments  to  the 
experimental  units  in  such  a  way  that  a  balanced  configuration  results. 
(Circular  though  the  preceding  definition  is,  I  feel  it  projects  the 
thought  I  wish  to  impart.  Consequently,  I  hope  you  will  forgive  the 
poor  logic.)  It  should  be  clear  that  we  can  have  little  or  no  balance, 
partial  balance,  approximate  balance,  or  complete  balance  in  any  par 
ticular  design.  For  instance,  Example  10.5  illustrates  a  case  of  approxi 
mate  balance,  while  Example  10.6  might  be  construed  as  an  illustra 
tion  of  partial  balancing.  Rather  than  go  on  to  manufacture  further 
examples  at  this  time,  let  us  defer  the  matter  until  later.  As  you  pro 
gress  through  the  chapters  on  various  designs  which  follow,  it  will 
become  abundantly  clear  that  the  statistician  continually  strives  for 
balanced  designs.  Thus,  examples  of  completely  balanced  designs  will 
be  available  in  excess. 

10.12     TREATMENTS  AND  TREATMENT  COMBINATIONS 

Several  times  in  the  preceding  sections,  the  word  "treatments"  has 
been  used  with  little  or  no  explanation.  Just  what  is  meant  by  this 
word?  Like  so  many  other  terms  in  statistics,  the  word  "treatments" 
entered  the  literature  because  of  its  use  in  agronomic  experimentation. 
However,  the  word  "treatments"  (like  "blocks"  and  "plots")  has  long 
since  lost  its  strict  agronomic  connotation.  In  fact,  the  three  phrases 
mentioned  in  the  preceding  sentence  are  now  an  accepted  part  of  the 
language  of  statistics,  regardless  of  the  area  of  application. 

To  the  statistician,  the  word  treatment  (or  treatment  combination) 
implies  the  particular  set  of  experimental  conditions  which  will  be  im- 
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posed  on  an  experimental  unit  within  the  confines  of  the  chosen  design. 
By  way  of  explanation,  several  illustrations  will  now-  be  given : 

(1)  In  agronomic  experimentation,  a  treatment  might  refer  to: 

(a)  a  brand  of  fertilizer,  (b)  an  amount  of  fertilizer,  (c)  a  depth 
of  seeding,  or  (d)  a  combination  of  (b)   and   (c).  The  latter 
example  would  more  properly  be  termed  a  treatment  combi 
nation. 

(2)  In  animal  nutrition  experimentation,  a  treatment  might  refer 
to:  (a)  the  breed  of  sheep,  (b)  the  sex  of  the  animals,  (c)  the 
sire  of  the  experimental  animal,  or  (d)  the  particular  ration 
fed  to  an  animal. 

(3)  In  psychological  and  sociological  studies,  a  treatment  might 
refer  to:  (a)  age,  (b)  sex,  or  (c)  amount  of  education. 

(4)  In  an  investigation  of  the  effects  of  various  factors  on  the 
efficiency  of  washing  clothes  in  the  home,  the  treatments  were 
various  combinations  of:  (a)  the  type  of  water  (hard  or  soft), 

(b)  temperature  of  water,  (c)  length  of  wash  time,  (d)  type  of 
washing  machine,  and  (e)  kind  of  cleansing  agent. 

(5)  In  an  experiment  to  study  the  yield  of  a  certain  chemical  proc 
ess,  the  treatments  might  be  all  combinations  of:  (a)  the  tem 
perature   at   which   the   process   was   operated    and    (b)    the 
amount  of  catalyst  used. 

(6)  In  a  research  and  development  study  concerned  with  batteries, 
the  treatments    could   be   various    combinations    of:    (a)    the 
amount  of  electrolyte  and  (b)  the  temperature  at  which  the 
battery  was  activated. 

Many  more  examples  could  be  cited  from  every  field  in  which  experi 
mentation  is  performed.  However,  later  chapters  will  abound  with 
such  examples.  Thus,  it  seems  best  that  we  move  on  to  other  matters. 

10-13      FACTORS,    FACTOR    LEVELS,   AND   FACTORIALS 

In  any  discussion  of  experimental  design,  the  word  "factorial"  is 
almost  certain  to  be  heard.  Frequently,  the  reference  is  to  a  "factorial 
design."  However,  this  is  actually  a  misnomer.  There  is  no  such  thing 
as  a  factorial  design.  The  adjective  "factorial"  refers  to  a  special  way 
in  which  treatment  combinations  are  formed  and  not  to  any  basic  type 
of  design.  Thus,  if  a  randomized  complete  block  design3  has  been 
selected  and  the  treatment  combinations  are  of  a  factorial  nature,  a 
more  correct  expression  would  be  "a  randomized  complete  block  design 
involving  a  factorial  treatment  arrangement."  Some  writers,  such  as 
Yates  (46),  have  recognized  this  situation  and  they  speak  of  factorial 
experiments  rather  than  factorial  designs.  This  shift  in  terminology, 
while  in  the  proper  direction,  does  not  completely  resolve  the  difficulty 
since  the  word  "experiment"  seems  to  imply  that  survey  data  are  to  be 
excluded.  To  avoid  any  such  implication,  we  shall  speak  not  of  factorial 

*  See  Chapter  12  for  a  definition  of  this  type  of  design. 
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designs  nor  of  factorial  experiments,  but  simply  of  factorials.  It  is  to  be 
understood,  of  course,  that  this  is  only  an  abbreviation  for  a  more 
lengthy  expression  describing  the  nature  of  the  treatments. 

Having  introduced  the  subject  of  factorials,  it  is  desirable  that  spe 
cific  terms  be  defined  in  an  explicit  manner.  This  will  now  be  done. 

In  most  investigations,  the  researcher  is  concerned  with  more  than  one 
independent  variable  and  in  the  changes  that  occur  in  the  dependent 
variable  as  one  or  more  of  the  independent  variables  are  permitted  to 
vary.  In  the  language  of  experimental  design,  an  independent  variable 
is  referred  to  as  a  factor.  Referring  to  the  illustrations  in  the  preceding 
section,  it  is  noted  that  five  factors  were  listed  for  the  home  washing 
study,  while  the  battery  study  involved  only  two  factors.  The  reader 
can  easily  find  many  more  examples  of  investigations  involving  several 
factors  by  consulting  various  technical  journals. 

Before  proceeding  to  the  definition  of  the  next  term  arising  in  con 
nection  with  factorials,  it  will  be  wise  to  indicate  the  generally  accepted 
notation  used  to  represent  factors.  Most  writers  use  lower  case  Latin 
letters  to  represent  factors.  As  an  illustration,  the  five  factors  in  the 
home  washing  experiment  might  be  represented  by 

OT  — =  type  of  washing  machine 
a  =  kind  of  cleansing  agent 
b  —  type  of  water 
c  =  temperature  of  water 
d  =  length  of  wash  time. 

A  second  illustration  is  provided  by  an  investigation  conducted  by 
Ratner  (40).  His  experiment  involved  a  study  of  how  long  it  took  to  per 
form  a  certain  move,  and  the  factors  investigated  were 

d  =  distance 
w  =  weight 
o  =  operator-pair 

It  was  mentioned  earlier  that  the  researcher  is  generally  interested 
in  experimental  results  (observations  on  the  dependent  variables)  as 
one  or  more  factors  are  allowed  to  vary.  It  will  be  seen  in  Ratner's  study 
that  he  considered  3  distances  (d^,  d2,  <33),  10  weights  (wi}  -  -  -  ,  1^10), 
and  4  operator-pairs  (ox,  o2,  o3,  o4).  In  the  home  washing  experiment, 
the  investigator  used  2  types  of  machine,  2  kinds  of  cleansing  agent,  2 
types  of  water,  2  temperatures  of  water,  and  2  lengths  of  wash  time. 
These  various  values,  or  classifications  of  the  factors,  are  known  as  the 
levels  of  the  factors.  That  is,  there  were  10  levels  of  weight,  3  levels  of 
distance,  and  4  levels  of  operator-pairs  in  Ratner's  experiment.  In  the 
home  washing  study,  each  factor  appeared  at  2  levels.  These  two  ex 
amples  should  indicate  that  the  word  "level"  is  a  very  general  term 
which  may  be  applied  in  many  varied  situations.  Ratner's  investiga- 
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tion  of  move  times  provides  an  excellent  example  of  this  diversity,  for 
tlie  3  levels  of  distance  (6,  12,  and  18  inches)  are  values  of  a  continuous 
variable,  while  the  4  levels  of  operator-pairs  (i.e.,  4  distinct  pairs  of 
operators  formed  from  8  individuals)  are  classifications  of  a  qualitative 
variable. 

Since  so  many  experiments  involve  factorial  treatment  arrange 
ments,  it  is  necessary  that  some  notation  be  adopted  to  represent  the 
various  treatment  combinations.  Unfortunately,  several  systems  of 
notation  appear  in  the  literature.  These  are  summarized  in  Table  10.1 

TABLE   10.1-niustrations  of  Notations  Used  To 
Represent  Factorial  Treatment  Combinations 


HF^f/ir*  4-rvt  m-*i  -f- 

Method 

j_  teatment 
Combination 

I 

II 

III 

IV 

V* 

1  

Q>\O\G\ 

111 

daboCQ 

000 

(1) 

2 

Qf\b\Ci 

112 

dob^c^ 

001 

c 

3  

a\b\cz 

113 

ao&o£2 

002 

c* 

4  

CLib^Ci 

121 

dobiCQ 

010 

b 

5  

CLlbzCz 

122 

G^biCi 

Oil 

be 

6  

ciibzCs 

123 

aobiCs, 

012 

be2 

7  

CL^b^Ci 

211 

ciibQCQ 

100 

a 

8  

dJ}\C2 

212 

CLlboCi 

101 

ac 

9  

aJb^Cz 

213 

dibaCz 

102 

ac2 

10                   

dob^Ci 

221 

a\b\CQ 

110 

ab 

11      . 

CLzbtCz 

222 

a^biCi 

111 

abc 

12 

CLzb^Cz 

223 

GlbiCz 

112 

abc* 

*  In  this  representation,  the  absence  of  a  letter  implies  that  the  factor  which  it  represents 
is  at  the  lowest  level.  In  general,  the  exponents  on  the  letters  agree  with  the  subscripts 
used  in  Method  III.  Thus,  ao&tffc  becomes  a°blc*  =  bc*.  The  symbol  (1)  is  used  to  signify  that 
each  factor  is  at  its  lowest  level,  that  is,  oo&o^o  is  equivalent  to  a°60c°=*  (1). 

for  a  case  involving  12  treatment  combinations  where  the  12  combina 
tions  were  formed  from  2  levels  of  factor  a,  2  levels  of  factor  6,  and  3 
levels  of  factor  c.  In  this  rep  resent  at  ion,  using  Method  I  as  an  ex 
ample,  the  symbol  a.-^-c*  (i=l,  2;  j=l,  2;  &=1,  2,  3)  represents  the 
treatment  combination  formed  by  using  the  iih  level  of  factor  a,  the 
jth  level  of  factor  b,  and  the  fcth  level  of  factor  c. 

There  is  another  item  of  terminology  that  should  be  mentioned  in 
the  present  context.  This  item  is  best  explained  by  example.  The  fac 
torial  arrangement  of  the  treatments  used  in  Table  10.1  would  be  re 
ferred  to  by  the  statistician  as  a  2X2X3  factorial.  Similarly,  Ratner's 
investigation,  would  be  termed  a  3X10X4  factorial,  while  the  home 
washing  study  was  a  2X2X2X2X2  =  25  factorial. 

Before  leaving  (for  the  time  being)  the  subject  of  factorials,  it  is  only 
fair  that  the  reader  be  warned  of  a  double  use  of  certain  symbols  which 
could  (but  should  not)  lead  to  confusion.  The  situation  is  as  follows: 
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It  is  common  practice  to  use  the  letters  a,  6,  c,  •  •  •  to  denote  not  only 
the  various  factors  but  also  the  number  of  levels  of  the  factors.  For 
example,  a  statistical  model  might  be  written  as 

Y#  =  M  +  <*i  +  ftj  +  aj;         i  =  1,  -  -  -  ,  a  (10. 1) 

j  =  1,  •  •  •  ,  * 

where 

fj,  =  mean  effect 

<xt  =  effect  of  the  ith  level  of  factor  a 
$3  =  effect  of  thejth  level  of  factor  b 
eij  =  experimental  error 

and 

!>;=    Z&   =    0 
»=l  y=i 

while  the  6»/  are  NID  (0,  cr) .  In  this  and  similar  situations,  the  decision 
to  use  a  and  6  to  denote  not  only  the  factors  but  also  the  number  of 
levels  of  each  factor  should  not  lead  to  any  confusion.  The  sense  in 
which  a  letter  is  being  used  in  any  particular  instance  should  always 
be  perfectly  clear  from  the  context. 

10.14      EFFECTS  AND    INTERACTIONS 

Whenever  a  statistician  undertakes  the  design  of  an  experiment,  he 
must  first  ascertain  the  objectives  of  the  researcher.  Frequently,  the 
objectives  may  be  very  simple.  For  example,  the  researcher  may  wish 
to  determine  the  effect  on  the  yield  of  a  chemical  reaction  of  changing 
the  operating  temperature  while  all  other  factors  (variables)  are  held 
constant  at  predetermined  levels.  On  the  other  hand,  he  may  have  no 
interest  whatsoever  in  temperature;  his  concern  might  be  only  with 
pH.  In  this  case,  an  experiment  would  be  planned  to  determine  the 
effect  of  pH  under  the  restriction  that  all  other  factors  (including  tem 
perature)  are  held  constant. 

Experiments  such  as  those  referred  to  in  the  preceding  paragraph  are 
fine  if  the  effects  of  pH  and  temperature  (on  the  response  variable)  are 
independent.  However,  if  we  know  that  the  factors  are  interdependent, 
or  if  we  are  doubtful  of  the  validity  of  an  assumption  of  independence, 
then  an  experiment  which  estimates  both  main  effects  and  interactions 
should  be  recommended.  Such  an  experiment  would,  of  course,  utilize 
a  factorial  arrangement  of  the  treatments. 

Example  10.7 

It  is  suggested  that  the  effects  of  pTEL  and  temperature  on  the  yield  of 
a  certain  chemical  reaction  are  not  independent.  It  is,  therefore,  recom 
mended  that  a  design  be  adopted  which  utilizes  treatment  combinations 
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formed  by  combining  different  levels  of  the  two  factors  involved.  It  is 
decided  that  two  levels  of  each  factor  will  be  investigated.  Denoting 
plS.  by  a  and  temperature  by  b,  the  four  treatment  combinations  might 
be: 


=  pEL  of  4.0  and  a  temperature  of  30°C. 
=  pTEL  of  4.0  and  a  temperature  of  40°C. 
=  pH  of  4.4  and  a  temperature  of  30°C» 
of  4.4  and  a  temperature  of  40°C. 


Before  we  can  say  how  the  performance  of  an  experiment  involving  a 
factorial  set  of  treatment  combinations  will  help  answer  our  questions 
concerning  independence  of  the  factors,  it  will  be  necessary  to  define 
certain  terms.  These  terms  (effect,  main  effect,  and  interaction)  have 
already  been  used  without  explanation.  The  time  has  now  arrived  when 
specific  definitions  must  be  given. 

We  shall  consider  first  a  22  factorial  such  as  the  one  used  in  Example 
10.7.  If  we  agree  that  the  symbols  a^bj  (i  =  Q,  l;y  =  0,  1)  can  represent 
not  only  the  treatment  combinations  but  also  the  average  yields  from 
all  experimental  units  subjected  to  the  similarly  designated  treatment 
combinations,  it  is  possible  to  define  effect,  main  effect,  and  interaction 
as  noted  below.  (NOTE  :  To  avoid  complicating  the  discussion,  it  has 
been  assumed  that  each  average  yield  was  obtained  from  the  same 
number  of  experimental  units.) 

Effect  of  a  at  level  b0  of  6  =  a-Lbo  —  a0&o  (10.2) 

Effect  of  a  at  level  &i  of  b  =  a^bi  —  a06i  (10.3) 

Main  effect  of  a  —  [(ai&o  —  #o&o)  +  (#i&i  —  a0&i)]/2 

=  G*i  -  ao)(Si  +  6o)/2  (10.4) 

=  A. 

Similarly, 

Effect  of  b  sit  level  aQ  of  a  =  ao&i  —  aQb0  (10.5) 

Effect  of  6  at  level  a±  of  a  =  aj>^  —  #160  (10.6) 

Main  effect  of  6  =   [(a05i  —  #0&o)  +  (^1^1  ~~  ^160)  3/2 

=  (ai  +  a0)(6i  -  6o)/2  (10.7) 

=  B. 

If  a  and  6  were  acting  independently,  the  effect  of  a  at  60  and  the  effect 
of  a  at  61  should  be  the  same.  (A  similar  statement  holds  for  the 
effects  of  b  at  ao  and  ai.)  Thus,  any  difference  in  these  two  effects  is  a 
measure  of  the  degree  of  interdependence  between  the  factors,  that  is, 
of  the  extent  to  which  a  and  b  interact.  Accordingly,  we  define  the 
interaction  between  a  and  b  by 
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AB   = 


(10.8) 


If  the  symbols  used  in  the  preceding  definitions  are  simplified  by  re 
placing  ao  and  60  by  unity,  and  a\  and  61  by  a  and  6,  the  effects  and  in 
teractions  may  be  defined  by 

4M  =  (a  +  1)(6  +  1)  (10.9) 

2,4  =  (a  -  1)(5  +  1)  (10.10) 

2J5  =  (a  +  1)(6  -  1)  (10.11) 

2^LJ3  =  (a  —!)(&  —  1)  (10.12) 

where  M  represents  the  mean  effect  (i.e.,  the  mean  yield  of  all  experi 
mental  units). 

Example  10.8 

Let  us  assume  that  an  experiment  has  been  performed  involving 
treatments  such  as  described  in  Example  10.7.  To  illustrate  the  compu 
tation  of  main  effects  and  interactions;  three  hypothetical  cases  will  be 


examined. 


II 


III 


O-Q 


bo 

61 


63 

67 

69 

73 

61 


63 

67 

69 

78 

Z>0 
Jl 


63 

67 

69 

70 

Case       I:  71^  =  68,  A  =4,  B  =  6,  and  AB  =  0. 

Case     II:  M  =  69.25,  A  =  6.5,  B  =  8.5,  and 

Case  II I:  M  =  67.25,  A  =  2.5,  B  =  4.5,  and  A B 1.5. 

Having  defined  and  illustrated  (for  a  22  factorial)  the  concepts  of 
effects,  main  effects,  and  interactions,  it  is  appropriate  that  an  attempt 
be  made  to  put  these  ideas  into  words  rather  than  symbols.  However, 
the  reader  is  reminded  (again)  that  the  understanding  of  a  concept  is 
much  more  important  than  the  memorization  of  any  definition,  whether 
it  be  in  words  or  in  mathematical  symbolism.  With  that  reminder,  let 
us  now  attempt  definitions  of  the  two  terms,  "interaction"  and  "main 
effect."  Utilizing  the  earlier  definitions  and  the  illustrations  in  Example 
10.8,  we  may  say  that: 

(1)  Interaction  is  the  differential  response  to  one  factor  in  combination 
with  varying  levels  of  a  second  factor  applied  simultaneously.  That 
is,  interaction  is  an  additional  effect  due  to  the  combined  influence 
of  two  (or  more)  factors. 

(2)  The  main  effect  of  a  factor  is  a  measure  of  the  change  in  the  response 
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variable  to  changes  in  the  level  of  the  factor  averaged  over  all  levels  of 
all  the  other  factors. 

It  should  be  clear  that  the  concepts  described  as  effects  and  interac 
tions  will  also  be  present  in  situations  involving  more  than  two  factors. 
For  example,  in  a  case  involving  four  factors,  there  would  be  four  main 
effects,  six  two-factor  interactions  involving  the  combined  effect  of  two 
factors  averaged  over  the  other  two  factors,  four  three-factor  inter 
actions  involving  the  combined  effect  of  three  factors  averaged  over 
the  one  remaining  factor,  and  one  four-factor  interaction  involving  the 
combined  effect  of  all  four  factors.  Extensive  discussion  of  these  ideas 
will  be  deferred  until  a  later  chapter. 

Before  terminating  the  discussion  of  effects  and  interactions,  how 
ever,  two  additional  topics  will  be  mentioned*  One  is  a  convenient 
method  of  determining  the  effects  in  2n  factorials;  the  other  is  the 
definition  of  effects  and  interactions  for  3n  factorials. 

To  illustrate  the  method  of  calculating  effects  in  2n  factorials,  let  us 
consider  a  23  factorial.  Using  the  abbreviated  notation  for  treatment 
combinations  given  in  Table  10.1,  and  letting  these  symbols  also  repre 
sent  the  average  yields  of  experimental  units  subjected  to  the  similarly 
designated  treatment  combinations,  the  main  effects  and  interactions 
may  be  found  by  adding  and  subtracting  yields  according  to  the  signs 
given  in  Table  10.2.  It  can  easily  be  verified  that  this  procedure  is 
simply  a  tabular  device  for  calculating  the  effects  and  interactions 
defined  by 

X  =   (a  ±   1)(6  ±  l)(c  ±  l)/22  (10.13) 

where  the  sign  in  each  set  of  parentheses  is  plus  if  the  corresponding 
capital  letter  is  not  contained  in  X  and  negative  if  it  is  contained  in  X, 
and  the  right-hand  side  is  to  be  expanded  and  the  yields  substituted  for 
the  appropriate  treatment  combination  symbols.  Equation  (10.13)  may 
be  extended  to  the  2n  factorial  case  by  simply  adding  more  multipli- 

TABLE    10.2-Schematic  Representation  of  Effects  and 
Interactions  in  a  23  Factorial 


Treatment  Combination 

Effect 

C\T 

(1)             a              b              ab              c              ac             be            abc 

Interaction 

+             +             +             +             +             +             +             + 

SM 

_|_             _|_             —             _|_             —             -|- 

4:A 

____^_|__-_-|-~|_ 

4B 

_|_             _(__(__             —             _]_ 

4AB 

__[__^_^-_|_ 

4C 

_|_             _)-             —              +             —             _|- 

4  AC 

^_1____              —             —             +             + 

4BC 

-             +             +             -             +             --             + 

4ABC 
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cative  factors  as  shown  in  Equation  (10.14), 

X  =   [O  ±  1)(6  ±   l)(c  ±  1)0*  ±   1)  •  •  •  j/2"-1.          (1O.14) 


Wlien  factors  are  investigated  at  only  two  levels,  the  best  the  re 
searcher  can  do  (apart  from  a  simple  test  of  significance)  is  to  deter 
mine:  (1)  whether  the  effect  of  a  factor  is  positive  or  negative  and 
(2)  whether  the  factors  are  independent.  However,  when  factors  are 
investigated  at  more  than  two  levels,  the  researcher  can  probe  more 
deeply.  He  now  has  the  opportunity  to  see  if  the  effect  of  a  factor  is 
linear  or  nonlinear.  In  most  experimental  work,  this  is  a  very  impor 
tant  item  of  information,  and  thus  the  researcher  should  give  serious 
consideration  to  factorials  involving  more  than  two  levels  of  the  fac 
tors  when  planning  an  investigation. 

If  an  experiment  is  designed  involving  two  factors,  each  at  three 
levels,  the  main  effects  and  interactions  may  be  used  to  study  the  non- 
linearity  of  the  response  variable.  Rather  than  go  into  excessive  detail 
at  this  time,  only  the  pertinent  formulas  will  be  presented.  In  these  for 
mulas  we  have  again  used  the  symbols  a^bj  (i  =  0,  1,  2;  j  =  Q,  1,  2)  to 
represent  both  the  treatment  combinations  and  the  yields  from  the 
treatment  combinations. 

Linear  effect  of  a  =  AL  =  (a2  —  a0)(6o  +  61  +  £2)/3  (10.15) 

Quadratic  effect  of  a  =  AQ  =   (a2  —  2#i  +  #0)(£o  +  &i  •+-  t>z)/6  (10.16) 

Linear  effect  of  b  =  BL  =  (<z0  +  #1  +  #2)(&2  —  £o)/3  (10.17) 

Quadratic  effect  of  6  =  BQ  =  (a0  +  ai  +  a2)(Z>2  —  2iL  +  J0)/6  (1O.18) 

Linear  X  Linear  interaction  =  A^B^  =   (a2  —  #0)(&2  ~~  *o)/2  (10.19) 

Linear  X  Quadratic  interaction 

(10.20) 
=   ALBQ  =   (a*  —  a0)(£2  —  26i  +  60)/4 


Quadratic  X  Linear  interaction 

(10.21) 


Quadratic  X  Quadratic  interaction 

(10.22) 
=   AQBQ  =   (a*  —  2ai  +  a0)(62  —  26i  +  60)/8 


Example  10.9 

Consider  an  experiment  similar  to  that  described  in  Example  10.7 
but  involving  three  levels  of  pH  and  three  levels  of  temperature.  As  in 
Example  10.8,  three  cases  will  be  considered. 


I 

£7o  <Zl 
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II  III 


10 

13 

16 

13 

16 

19 

16 

19 

22 

60 
61 
62 


Case       I:  ^£,  =  6, 

and  j4.QjBQ  =  0. 
,  ——       o,      -^IQ 

=  03  and 

r  T  T  .    j .  Q 

J.  J.  -L   .      -tTL  L  O  , 


22 

10 

14 

25 

13 

17 

30 

18 

22 

10 

12 

11 

14 

17 

21 

19 

25 

35 

J  and 


—  1/8. 


It  should  IOQ  noted  that  the  "no  interaction"  result  in  cases  I  and  II 
could  have  been  predicted  by  observing  that  the  pattern  of  differences 
between  yields  at  varying  levels  of  b  is  the  same  for  each  level  of  a. 
(NOTE:  We  could  just  as  easily  have  examined  the  differences  between 
yields  at  varying  levels  of  a  for  each  level  of  &). 

From  the  preceding  discussion,  it  should  be  evident  that  there  is  a 
great  deal  to  be  said  about  effects  and  interactions.  As  a  matter  of  fact, 
what  started  out  to  be  a  short  section  exposing  the  reader  to  general 
concepts  has  grown  (necessarily,  I  believe)  into  a  rather  detailed  dis 
cussion  of  the  topic.  On  tlie  other  hand,  the  surface  has  only  been 
scratched.  There  is  much  more  that  can  be  said.  Some  of  this  additional 
material  will  be  discussed  in  later  chapters,  while  the  remainder  will  be 
left  to  books  devoted  to  experimental  design.  For  those  who  wish,  to 
read  further  on  these  topics,  the  following  references  are  recommend 
ed:  Cochran  and  Cox  (13),  Cox  (14),  Davies  (16),  Federer  (20),  Finney 
(21  and  22),  Kempt  home  (28),  Quenouille  (39),  and  Yates  (46). 

10.15     TREATMENT  COMPARISONS 

In  most  experiments  involving  several  treatments,  the  researcher  will 
be  interested  in  certain  specific  comparisons  among  the  treatment 
means.  To  aid  in  making  such,  comparisons,  the  statistician  finds  it 
convenient  to  talk  in  terms  of  "contrasts,"  Algebraically,  a  contrast 
among  the  quantities  TI,  -  -  -  ,  T&  (where  2\-  is  the  sum  of  nt  observa 
tions)  is  defined  by 


ck5Tk 


(10.23) 


where 


(10.24) 


If  each  ni  =  n,  that  is,  if  each  Tt  is  the  sum  of  the  same  number  of  ob- 
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servations,  then  the  necessary  condition  for  a  contrast  reduces  to 

,    =  0.  (10.25) 


Example  10.10 

Consider  an  experiment  involving  batteries  in  which  four  treatments 
are  to  be  investigated.  The  four  treatments  happen  to  be  four  different 
electrolytes.  However,  it  is  noted  that  electrolytes  No.  1  and  No.  2  are 
quite  similar  in  composition,  that  No.  3  and  No.  4  are  also  similar,  but 
that  Nos.  1  and  2  differ  considerably  from  Nos.  3  and  4.  It  would,  then, 
be  reasonable  to  plan  comparisons  of:  (1)  treatments  1  and  2  versus  treat 
ments  3  and  4,  (2)  treatment  1  versus  treatment  2,  and  (3)  treatment 
3  versus  treatment  4.  Assuming  that  20  batteries  (experimental  units) 
are  used  and  that  they  are  allocated  to  the  treatments  in  the  ratio 
4:2:5:9,  what  would  be  the  form  of  the  contrasts  for  the  selected  treat 
ment  comparisons?  Denoting  the  treatment  totals  by  T^i  =  ly  2,  37  4), 
the  desired  contrasts  are: 


=  7rz  +  7T2  -  3T3  —  3T* 
C2  -  (l)Ti  +  (-2)2-2  +  (0)2*8 
C3  -  (0)2-1  +  (0)2*a  +  (9)T,  +  (-5)  2V 

One  might  ask  how  we  obtained  the  coefficients  c*j  (i  =  1,  2,  3,  4;  j"  =1,2, 
3)  used  in  the  above  comparisons.  A  short  explanation  at  this  moment 
should  serve  to  clear  up  any  difficulties.  Consider  the  case  of  compari 
son  Ci:  What  we  are  actually  attempting  to  do  is  to  compare  the  mean 
of  6  observations  (4  +  2)  with  the  mean  of  14  observations  (5+9).  It  is, 
of  course,  necessary  to  adjust  for  the  spurious  weighting  given  by  our 
comparison  of  treatment  totals  based  on  unequal  numbers  of  ob 
servations.  Since  the  smallest  integer  which  may  be  divided  evenly  by 
both  6  and  14  is  42,  we  see  that  7  and  3  are  the  indicated  weights  to 
be  used  if  our  comparison  is  to  be  unaffected  by  the  differing  numbers 
of  observations  associated  with  the  various  treatments.  The  remaining 
coefficients  are  found  in  a  like  manner. 

Example  10.11 

Consider  a  research  situation  similar  to  that  described  in  Example 
10.10,  but  involving  five  treatments.  Suppose  that  four  batteries  are 
allocated  to  each  treatment.  If  treatment  No.  2  represents  a  commonly 
used  electrolyte,  while  Nos.  1,  3,  4,  and  5  are  newly  developed  electro 
lytes  in  which  Nos.  1  and  3  are  of  type  A  and  Nos.  4  and  5  are  of  type 
B,  the  contrasts  specified  in  Table  10.3  are  appropriate  for  the  obvious 
treatment  comparisons. 

Let  us  now  take  note  of  another  item  of  importance.  If  two  con 
trasts, 

(10.26) 
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TABLE   10.3-Symholic  Representation  of  the  Contrasts  for  the 
Treatment  Comparisons  Specified  in  Example  10.11 


Electrolyte 

Contrast 

1                        2 

3                        4 

5 

Ci  

—  1                    +4 

i                     i 

—  1 

C2  - 

+  1                         0 

-4-1                    ~1 

—  1 

C3  

+  1                         0 

—  1                        0 

0 

C4  

0                        0 

0                    +1 

—  1 

and 

C_  ^T"*          I        ,-        T"'          F  I        ^-        HT^  /  "1  f~\      O  *7\ 

q    —     Clq**-   1    ~T~    ^2^-t  2    ~T~     *     *     "    ~T~    Ckq^-  ky  \1-\J .  £  I ) 

are  such  that 

]C  «**pcf«  =  0  (p^  g),  (10.28) 

then  the  contrast  CP  is  orthogonal  to  the  contrast  Cff.  (NOTE:  It  is 
common  practice  to  speak  of  orthogonal  contrasts  or  orthogonal  treat 
ment  comparisons.)  If  Ui  =  n  (for  all  i),  the  orthogonality  condition 
reduces  to 

23<^<^=0,          pr*q.  (10,29) 

The  reader  can  easily  verify  that  the  contrasts  specified  in  Examples 
10.10  and  10.11  are,  in  each  case,  orthogonal.  In  addition,  the  percep 
tive  student  will  have  noted  that  the  effects  and  interactions  discussed 
in  Section  10.14  were  also  orthogonal  contrasts. 

At  this  time?  the  following  question  might  well  be  asked,  namely, 
"Are  orthogonal  contrasts  better  than  nonorthogonal  contrasts?"  In 
tuitively,  orthogonal  contrasts  seem  to  be  preferable.  (NOTE :  Actually, 
they  are  preferable  if  one  wishes  the  estimates  derived  from  the  dif 
ferent  contrasts  to  be  uncorrelated.)  However,  occasionally  it  is  desir 
able  to  design  an  experiment  with  the  expressed  intent  of  analyzing  a 
set  of  nonorthogonal  contrasts.  In  such  cases,  the  probability  state 
ments  accompanying  the  associated  tests  of  significance  are  of  an  am 
biguous  nature  (due  to  the  correlation  between  the  contrasts),  and 
much  care  should  be  exercised  in  interpreting  the  experimental  results. 

One  final  remark  needs  to  be  made  and  then  we  may  move  on  to 
another  topic.  The  remark  is  the  following:  Regardless  of  the  desira 
bility  of  orthogonal  contrasts,  the  statistician  should  not  let  his  prefer 
ence  for  such  a  state  of  affairs  override  the  needs  of  the  researcher.  By 
this  is  meant  that,  as  nice  as  it  is  to  have  a  set  of  orthogonal  contrasts, 
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only  those  contrasts  which  are  meaningful  to  the  researcher  should  be 
analyzed. 

10.16      STEPS   IN    DESIGNING  AN    EXPERIMENT 

Each  statistician  has  his  own  list  of  steps  which  he  follows  when 
designing  an  experiment.  However,  a  comparison  of  various  lists 
reveals  that  they  all  cover  essentially  the  same  points. 

According  to  Kempthorne  (28),  a  statistically  designed  experi 
ment  consists  of  the  following  steps : 

(1)  Statement  of  the  problem. 

(2)  Formulation  of  hypotheses, 

(3)  Devising  of  experimental  technique  and  design. 

(4)  Examination  of  possible  outcomes  and  reference  back  to  the  reasons 
for  the  inquiry  to  be  sure  the  experiment  provides  the  required  in 
formation  to  an  adequate  extent. 

(5)  Consideration  of  the  possible  results  from  the  point  of  view  of  the 
statistical  procedures  which  will  be  applied  to  them,  to  ensure  that 
the  conditions  necessary  for  these  procedures  to  be  valid  are  satis 
fied. 

(6)  Performance  of  experiment. 

(7)  Application  of  statistical  techniques  to   the  experimental  results. 

(8)  Drawing  conclusions  with  measures  of  the  reliability  of  estimates  of 
any    quantities    that    are    evaluated,    careful    consideration    being 
given  to  the  validity  of  the  conclusions  for  the  population  of  objects 
or  events  to  which  they  are  to  apply. 

(9)  Evaluation  of  the  whole  investigation,  particularly  with  other  in 
vestigations  on  the  same  or  similar  problems.4 

In  a  later  section,  these  steps  will  be  illustrated  through  the  considera 
tion  of  some  design  problems. 

Since  the  designing  of  an  experiment  or  the  planning  of  a  test  pro 
gram  is  such  an  important  part  of  any  investigation,  the  statistician 
must  make  every  effort  to  obtain  all  the  relevant  information.  This 
will  usually  require  one  or  more  conferences  with  the  researcher,  and 
the  asking  of  many  questions.  It  has  been  my  experience  that  the 
amount  of  time  consumed  in  this  phase  can  be  materially  reduced  if,  at 
the  preliminary  meeting  between  the  researcher  (e.g.,  a  development 
engineer)  and  the  statistician,  time  is  taken  to  explore  the  relationship 
between  research  and/or  development  experimentation  and  the  sta 
tistical  design  of  experiments.  (NOTE:  Frequently,  there  is  a  formid 
able  communications  barrier  which  must  be  overcome.)  One  of  the 
best  ways  to  convince  the  researcher  of  the  need  for  the  multitude  of 
questions  posed  by  the  statistician  is  to  give  him  (in  the  first  meeting) 
a  "check  list"  which  specifies  various  stages  in  the  planning  of  a  test 
program.  (An  even  more  efficient  arrangement  if  you  are  the  statisti 
cian  in  an  industrial  organization  is  to  distribute  copies  of  such  a  list  to 
all  persons  who  may  at  some  time  have  need  of  your  services.)  One 

4  O.  Kempthorne,  The  Design  and  Analysis  of  Experiments,  John  Wiley  and 
Sons,  Inc.,  New  York,  1952,  p.  10. 
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such  list,  prepared  by  Bicking  (3),  is  reproduced  below  for  your  con 
sideration. 

Check  List  for  Planning  Test  Programs 

A.  Obtain  a  clear  statement  of  the  problem 

1.  Identify  the  new  and  important  problem  area 

2.  Outline  the  specific  problem  within  current  limitations 

3.  Define  exact  scope  of  the  test  program 

4.  Determine  relationship   of  the  particular  problem  to   the  whole  re 
search  or  development  program 

B.  Collect  available  background  information 

1.  Investigate  all  available  sources  of  information 

2.  Tabulate  data  pertinent  to  planning  new  program 

C.  Design  the  test  program 

1.  Hold  a  conference  of  all  parties  concerned 

a.  State  the  propositions  to  be  proved 

b.  Agree  on  magnitude  of  differences  considered  worthwhile 

c.  Outline  the  possible  alternative  outcomes 

d.  Choose  the  factors  to  be  studied 

e.  Determine   the  practical  range  of  these  factors   and  the  specific 
levels  at  which  tests  will  be  made 

f .  Choose  the  end  measurements  which  are  to  be  made 

g.  Consider  the  effect  of  sampling  variability  and  of  precision  of  test 
methods 

h.  Consider  possible  inter-relationships  (or  "interactions")  of  the 
factors 

i.  Determine  limitations  of  time,  cost,  materials,  manpower,  instru 
mentation  and  other  facilities  and  of  extraneous  conditions,  such, 
as  weather 

j.    Consider  human  relation  angles  of  the  program 

2.  Design  the  program  in  preliminary  form 

a.  Prepare  a  systematic  and  inclusive  schedule 

b.  Provide   for  step-wise   performance  or   adaptation   of  schedule  if 
necessary 

c.  Eliminate    effect    of    variables    not   under    study   by    controlling, 
balancing,  or  randomizing  them 

d.  Minimize  the  number  of  experimental  runs 

e.  Choose  the  method  of  statistical  analysis 

f.  Arrange  for  orderly  accumulation  of  data 

3.  Review  the  design  with  all  concerned 

a.  Adjust  the  program  in  line  with  comments 

b.  Spell  out  the  steps  to  be  followed  in  unmistakable  terms 

D.  Plan  and  carry  out  the  experimental  work 

1.  Develop  methods,  materials,  and  equipment 

2.  Apply  the  methods  or  techniques 

3.  Attend  to  and  check  details;  modify  methods  if  necessary 

4.  Record  any  modifications  of  program  design 

5.  Take  precautions  in  collection  of  data 

6.  Record  progress  of  the  program 
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E,  Analyze  the  data 

1.  Reduce  recorded  data,  if  necessary,  to  numerical  form 

2.  Apply  proper  mathematical  statistical  techniques 

F.  Interpret  the  results 

1.  Consider  all  the  observed  data 

2.  Confine  conclusions  to  strict  deductions  from  the  evidence  at  hand 

3.  Test  questions  suggested  by  the  data  by  independent  experiments 

4.  Arrive  at  conclusions  as  to  the  technical  meaning  of  results  as  well 
as  their  statistical  significance 

5.  Point  out  implications  of  the  findings  for  application  and  for  further 
work 

6.  Account  for  any  limitations  imposed  by  the  methods  used 

7.  State  results  in  terms  of  verifiable  probabilities 

G*    Prepare  the  report 

1.  Describe  work  clearly  giving  background,  pertinence  of  the  problems 
and  meaning  of  results 

2.  Use  tabular  and  graphic  methods  of  presenting  data  in  good  form  for 
future  use 

3.  Supply^  sufficient  information  to  permit  reader  to  verify  results  and 
draw  his  own  conclusions 

4.  Limit  conclusions  to  objective  summary  of  evidence  so  that  the  work 
recommends  itself  for  prompt  consideration  and  decisive  action.5 

The  reader  should  realize,  of  course,  that  the  two  lists  (of  steps  in 
designing  experiments)  presented  in  this  section  are  only  guides.  Very 
seldom  will  the  various  steps  be  tackled  and  settled  in  the  particular 
order  given.^The  statistician  does  not  operate  in  such  a  mechanical  and 
routine  fashion.  Questions  will  be  asked  and  answers  received  which  will 
trigger  new  lines  of  thought,  and  thus  the  planning  conference  will  find 
itself  jumping  from  one  step  to  another  in  a  seemingly  haphazard  man 
ner.  Furthermore,  it  is  not  surprising  to  find,  as  the  conference  pro 
gresses  and  new  information  is  brought  forth,  the  same  step  being  con 
sidered  several  times.  Regardless  of  the  repetition  inherent  in  such  a 
procedure,  it  is  a  good  procedure. 

In  summary,  then,  the  designing  of  an  experiment  can  be  a  time- 
consuming  and,  occasionally,  a  painful  process.  Thus,  the  use  of  check 
lists  such,  as  those  presented  earlier  can  be  most  helpful  (as  a  supple 
ment  to  common  sense)  in  making  relatively  certain  that  nothing  has 
been  overlooked. 

10.17      ILLUSTRATIONS     OF     THE     STATISTICIAN'S     AP 
PROACH   TO   DESIGN    PROBLEMS 

To  illustrate  the  manner  in  which  a  statistician  approaches  a  design 
problem,  a  series  of  examples  will  be  considered.  The  first  of  these  will 
demonstrate  the  application  of  Kempthorne's  nine  steps,  while  the 

6  Charles  A.  Bicking,  "Some  uses  of  statistics  in  the  planning  of  experiments  " 
Industrial  Quality  Control,  Vol.  10,  No.  4,  Jan.,  1954,  p.  23. 
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remainder   will   illustrate    various    topics   discussed   in    Sections    10.1 
through    10.16. 

Example  10.12 

Suppose  a  machine  is  constructed  for  the  purpose  of  generating  a 
random  series  of  0*s  and  1's.  If  the  machine  is  truly  a  generator  of 
random  binary  elements,  it  should,  among  other  things,  yield  03s  50  per 
cent  of  the  time  and  l*s  50  per  cent  of  the  time*  It  is  proposed  that  an 
experiment  be  devised  to  check  on  this  particular  aspect  of  the  random 
ness  of  the  machine. 

The  preceding  paragraph  illustrates  Kempthorne's  Step  1,  the 
statement  of  the  problem.  If  we  formulate  H:p0  =  %  (where  p0  stands 
for  the  probability  of  a  0)  and  A  :pQ  =?*%,  we  have  taken  care  of  Step  2. 
The  devising  of  an  experimental  technique  and  design  (Step  3)  is  fairly 
simple.  In  this  case  we  shall  operate  the  device  a  certain  number  of 
times,  say  n,  record  the  proportion  of  0Js  (po),  and  !see  if  this  is  in  close 
enough  agreement  with  the  hypothesis  H.  If  the  agreement  is  good, 
we  accept  H]  if  the  agreement  is  poor,  we  reject  H  and  accept  A,  the 
alternative  hypothesis.  The  only  remaining  part  of  Step  4  to  be  taken 
care  of  is  the  determination  of  the  number  of  operations  of  the  device 
that  are  required  before  we  feel  safe  in  making  a  decision.  Suppose  it  is 
desired  that  the  probability  of  rejecting  H  :po  =  i  (when  it  is  really  true) 
should  be  no  greater  than  a:  =  0.05.  This  implies  n>6,  as  can  easily  be 
shown.  Note  carefully  the  concept  of  rejecting  a  true  hypothesis.  The 
value  of  n  would  also  be  influenced  by  fixing  the  probability  of  accept 
ing  a  false  hypothesis,  but  we  choose  to  ignore  this  in  the  present 
example.  Step  5  consists,  in  this  case,  of  recognizing  that  the  results 
will  be  analyzed  using  the  binomial  distribution,  and  thus  we  should 
make  certain  that  the  repeated  events  (operations  of  the  device)  are 
statistically  independent.  Step  6  is  evident,  though  sometimes  trouble 
some.  When  discussing  Steps  3  and  4,  the  content  of  Step  7  was  alluded 
to,  and  all  that  remains  is  the  formalizing  of  the  analysis.  Step  8  implies 
that  we  should  produce  a  confidence  interval  estimate  of  the  true 
probability  of  producing  a  0  with  our  device;  that  is,  a  point  estimate,  pQ> 
is  not  sufficient.  We  must  also  be  very  careful  to  state  that  our  conclu 
sions  only  hold  for  the  particular  device  operated,  unless  this  device  was 
randomly  selected  from  a  larger  group  (or  population)  of  devices. 
Had  other  devices  of  a  similar  nature  been  investigated,  the  results  of 
our  experiment  should  be  evaluated  along  with  all  pertinent  informa 
tion  from  the  allied  studies  (Step  9). 

The  reader  will  probably  have  recognized  the  similarity  of  this  illus 
tration  to  Example  7.4.  It  is,  of  course,  the  same.  All  we  have  done  here 
is  "dress  up"  the  problem  and  use  it  to  illustrate  the  various  steps  in 
the  design  of  an  experiment. 

Example  10.13 

Consider  the  problem  of  an  engineer  who  wishes  to  assess  the  relative 
effects  of  eight  treatments  (for  the  moment  undefined)  on  the  activated 
life  of  a  particular  type  of  thermal  battery.  Assume  that  64  relatively 
homogeneous  batteries  are  available  for  experimentation.  With  only  this 
much  information,  the  most  efficient  design  would  be  to  randomly 
assign  the  batteries  to  the  eight  treatments  (groups)  subject  to  the 


268  CHAPTER    1 0,    DESIGN    OF    EXPERIMENTAL    INVESTIGATIONS 

restriction  that  8  batteries  be  allocated  to  each  treatment.  Such  an 
assignment  is  illustrated  in  Table  10.4.  The  reader  should  note  that  the 
major  design  decisions  reached  in  this  example  were  concerned  with 
balancing  and  grouping.  (NOTE:  The  type  of  design  described  above  is 
known  as  a  completely  randomized  design.)^ 

TABLE   10.4-Random  Assignment  of  Batteries  to 
Treatments  as  Described  in  Example  10.13 

Treatments 
ABCDEFGH 


9 

58 

37 

18 

14 

21 

48 

43 

22 

53 

36 

38 

1 

15 

63 

56 

64 

26 

30 

33 

50 

3 

60 

41 

34 

11 

5 

29 

27 

45 

57 

23 

17 

52 

6 

61 

16 

47 

25 

10 

4 

51 

13 

40 

49 

32 

59 

12 

31 

8 

2 

35 

46 

19 

7 

20 

28 

14 

54 

39 

44 

62 

55 

42 

Numbers  in  the  table  represent  serial  numbers  of  units;  a  random  order  of  testing  would 
also  be  determined. 

Example  1O.14 

As  a  second  illustration  of  a  completely  randomized  design,  consider 
the  agronomist  who  has  28  homogeneous  experimental  plots  available 
for  testing  the  relative  effects  of  4  different  fertilizers  on  the^  yield  of  a 
particular  variety  of  oats.  A  reasonable  design  would  be  to  impose,  at 
random,  a  different  fertilizer  on  each  plot.  If  the  restriction  is  imposed 
that  7  of  the  experimental  plots  be  allocated  to  each  fertilizer  (treat 
ment),  complete  balance  will  have  been  achieved. 

Example  10.15 

Referring  to  Example  10.13,  suppose  you  are  now  advised  that  the 
64  batteries  consist  of  8  batteries  from  each  of  8  different  production 
lots.  How  will  this  additional  information  affect  the  design?  If  it  is 
suspected  that  there  are  real  differences  among  the  lots,  the  precision 
of  the  experiment  can  be  improved  by  removing  the  lot-to-lot  variation 
from  the  estimate  of  experimental  error.  Such  an  improvement  in  de 
sign  may  be  accomplished  by  assigning  the  treatments  to  the  batteries 
at  random  within  each  lot.  Such  a  restricted  randomization  is  illustrated 
in  Table  10.5.  (NOTE:  The  type  of  design  described  above  is  known  as 
a  randomized  complete  block  design.)7 

The  major  benefit  resulting  from  this  type  of  blocking  is  a  gain  in 
efficiency  in  analysis.  That  is,  more  sensitive  tests  of  significance  for 
treatment  differences  can  be  made  and  shorter  confidence  interval 
estimates  of  treatment  effects  can  be  obtained. 

6  See  Chapter  11  for  further  discussion  of  completely  randomized  designs. 

7  See  Chapter  12  for  further  discussion  of  randomized  complete  block  designs. 
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TABLE   10.5— Random  Assignment  of  Treatments  to  Batteries 
Within  Lots  as  Described  in  Example  10.15 

Lots 


1-ff 

9-H 

17  -R 

25-C 

33-J5 

41-G 

49-3 

57-Z> 

2-C 

1CKE 

18-  A 

26-D 

34-F 

42-H 

50-.E 

58-  A 

3-F 

ll-D 

19-B 

27  -E 

33-D 

43-B 

51-6= 

59-F 

4-B 

12-F 

2Q-G 

2S-B 

36-G 

44-C 

52-F 

60-G 

5-& 

13-G 

21-C 

29-H 

37-C 

45-JS 

53-H 

61-C 

6-G 

14-C 

22-F 

30-G 

3S-A 

46-A 

54-A 

62-H 

7~D 

15-B 

23-D 

31-F 

39-J3 

47-F 

55~D 

63-B 

S-A 

16-  A 

24r-H 

32-^1 

4Q-H 

48-£> 

56-C 

64-JS 

Numbers  in  the  table  represent  serial  numbers  of  units;  the  letters  represent  treatments. 
It  will  be  observed  that  we  have  assumed  Lot  ISTo.  1  contains  batteries  1  to  8,  Lot  No.  2 
contains  batteries  9  to  16,  etc. 

Example  10.16 

Another  illustration  of  a  randomized  complete  block  design  is  pro 
vided  by  the  following  problem  in  nutrition  research..  A  nutritionist 
wishes  to  assess  the  relative  effects  of  four  newly  developed  rations  on 
the  weight-gaining  ability  of  rats.  He  has  20  rats  available  for  experi 
mentation.  Examination  of  the  pedigrees  of  the  experimental  animals 
indicates  that  the  20  rats  consist  of  4  rats  from  each  of  5  litters.  The 
statistician  would,  under  these  circumstances,  recommend  that  the 
rations  be  assigned  to  the  rats  at  random  within  each  litter  (block). 

Example  10.17 

Consider  next  a  somewhat  more  complex  problem.  Assume  that  we 
are  again  concerned  with  testing  batteries.,  but  this  time  the  problem 
arises  during  the  development  phase.  The  development  engineer  has  to 
reach  a  decision  about  three  things:  (1)  how  much  electrolyte  should 
be  incorporated  in  this  particular  model,  (2)  what  weight  of  heat  paper 
should  be  used  in  the  construction  of  the  batteries,  and  (3)  what  effect 
will  the  temperature  at  which  the  batteries  are  activated  have  on  the 
activated  life  of  the  batteries? 

Denoting  electrolyte  by  a,  heat  paper  by  6,  and  temperature  by  c, 
and  assuming  that  two  levels  of  each  factor  are  to  be  investigated,  the 
eight  treatment  combinations  might  be  as  shown  in  Table  10.6. 

It  is  decided  that  16  batteries  will  be  built  to  each  of  the  four  "elec 
trolyte-heat  paper'7  specifications,  providing  a  total  of  64  batteries  for 
testing.  As  a  precaution  against  bias  being  introduced  because  the  last 
batteries  built  might  be  better  than  the  first  batteries  built,  the  64  bat 
teries  will  be  built  in  a  random  order.  Next,  in  each  set  of  16  batteries, 
8  will  be  randomly  selected  for  testing  at  low  temperature,  and  the 
remaining  8  will  be  reserved  for  testing  at  high  temperature. 

When  this  stage  is  reached,  that  is,  once  each  battery  has  been  built 
and  assigned  a  test  temperature,  the  64  batteries  will  be  arranged  in 
random  order  for  individual  testing.  As  you  can  probably  anticipate, 
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TABLE   10.6-Factors,  Factor  Levels,  and  Treatment  Combinations  for 
the  Experiment  Described  in  Example  10,17 


Fa 

ctors  and  Factor  Le 

vels 

Treatment 
Combination 

Amount  of 
Electrolyte 
(gm/cell) 

Weight  of  Heat 
Paper  (gm/cell*) 

Test 
Temperature  (°F) 

(1) 

1 

4 

—  50 

<z  

2 

4 

—  50 

b  

1 

6 

—  50 

ab.              .... 

2 

6 

—  50 

c  

1 

4 

100 

ac  

2 

4 

100 

be  

1 

6 

100 

abc  

2 

6 

100 

this  last  restriction  frequently  proves  to  be  unpopular,  especially  if  only 
one  temperature  chamber  is  available.  (NOTE:  The  design  that  has 
been  formulated  is  a  completely  randomized  design  involving  a  23  fac 
torial  with  8  experimental  units  per  treatment.) 

Example  10.18 

Suppose  that  we  now  consider  a  slightly  different  problem.  Like  many 
of  us  engaged  in  research,  the  development  engineer  is  often  hard  pressed 
for  funds.  If  this  were  the  case  in  the  situation  described  in  Example 
10.17,  the  development  engineer  might  place  a  preliminary  order  for  8 
batteries.  Of  the  8,  2  would  be  assembled  to  each  of  the  4  "electrolyte- 
heat  paper"  combinations.  His  plan,  of  course,  would  be  to  test  1  bat 
tery  in  each  pair  at  low  temperature  and  1  at  high  temperature.  This 
testing  would,  naturally,  take  place  in  a  random  order. 

Next,  assume  that,  after  the  first  8  batteries  are  built  and  tested, 
funds  are  made  available  for  the  building  and  testing  of  8  additional 
batteries.  These  would  be  ordered  without  delay  in  order  to  provide 
some  replication  of  the  experiment.  However,  due  to  the  way  in  which 
the  batteries  were  produced  and  tested,  that  is,  first  8  and  then  8  more, 
it  is  clear  that  the  combined  analysis  of  all  16  batteries  must  take  into 
account  the  blocking  which  is  implicit  in  the  data.  (NOTE:  In  this 
example  we  have  a  randomized  complete  block  design  consisting  of  two 
blocks  and  involving  a  23  factorial  set  of  treatment  combinations.) 

Example  10.19 

Referring  again  to  the  problem  described  in  Example  10.17,  suppose 
that  two  additional  complications  arise:  (1)  only  8  batteries  can  be 
tested  in  a  normal  work  day,  and  (2)  in  the  interests  of  economy,  the 
test  engineer  wishes  to  place  4  batteries  in  the  temperature  chamber  at 
the  same  time.  Under  these  restrictions,  we  have  a  natural  set  of  blocks, 
namely,  days.  Further,  within  each  block  it  would  be  desirable  to  test 
1  battery  corresponding  to  each  of  the  8  treatment  combinations. 
Because  of  the  temperature  chamber  restriction,  we  would  decide,  ran- 
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domly  for  each  day,  whether  to  first  test  batteries  at  high  temperature 
and  then  test  batteries  at  low  temperature,  or  vice  versa.  Once  this 
decision  is  made,  the  random  order  of  testing  batteries  within  tempera 
tures  must  be  specified.  As  might  be  expected,  the  eventual  analysis  of 
the  data  will  take  due  cognizance  of  all  restrictions  placed  on  the  test 
program.  (NOTE:  The  type  of  design  illustrated  in  this  example  is 
known  as  a  split  plot  design*) 

10.18     ADVANTAGES  AND   DISADVANTAGES  OF  STATIS 
TICALLY   DESIGNED   EXPERIMENTS 

Having  spent  considerable  time  discussing  various  aspects  of,  and 
techniques  in,  experimental  design,  It  is  appropriate  that  the  advan 
tages  and  disadvantages  of  statistically  designed  experiments  be  con 
sidered.  These  will,  of  course,  be  expressed  In  different  ways  by  differ 
ent  people.  However,  as  "was  true  for  the  steps  involved  in  designing 
experiments,  an  examination  of  various  lists  of  advantages  and  disad 
vantages  will  show  that  all  the  lists  cover  essentially  the  same  points. 

Advantages  of  Statistically  Designed  Experiments 

Bicking  (3)  has  listed  the  advantages  of  statistical  designs  over  old 
kinds  of  designs  (nonstatistlcal)  as  follows : 

(1)  Close  teamwork  is  required  between  the  statisticians  and  the  re 
search  or  development  scientists  with  consequent  advantages  in  the 
analysis  and  interpretation  stages  of  the  program 

(2)  Emphasis  Is  placed  on  anticipating  alternatives  and  on  systematic 
pre-planning,  yet  permitting  step-wise  performance  and  producing 
only  data  useful  for  analysis  in  later  combinations 

(3)  Attention  is  focused  on  inter-relationships  and  on  identifying  and 
measuring  sources  of  variability  in  results 

(4)  The  required  number  of  tests  is  determined  reliably  and  often  may 
be  reduced 

(5)  Comparison  of  effects  of  changes  is  more  precise  because  of  group 
ing  of  results 

(6)  The  correctness  of  conclusions  is  known  with  definite  mathematical 
preciseness9 

If  these  advantages  truly  exist,  and  I  believe  they  do,  the  value  of 
statistical  aid  in  planning  experiments  is  evident  and  should  always  be 
sought. 

Disadvantages  of  Statistically  Designed  Experiments 

Happily,  there  are  more  advantages  than  disadvantages  associated 
with  statistically  designed  experiments.  In  fact,  I  found  it  somewhat 
difficult  to  formulate  a  list  of  disadvantages.  However,  a  careful  read 
ing  of  Mandelson.  (30),  together  with  a  realistic  appraisal  of  the  imple- 

8  See  Chapter  13  for  further  discussion  of  split  plot  designs. 

9  Charles  A.  Bicking,  "Some  uses  of  statistics  in  the  planning  of  experiments," 
Industrial  Quality  Control,  Vol.  10,  No.  4,  Jan.,  1954,  p.  22. 
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mentation  of  certain  statistically  designed  experiments,  did  yield  the 
following  possible  disadvantages: 

(1)  Such  designs  and  their  analyses  are  usually  accompanied  by 
statements  couched  in  the  technical  language  of  statistics.  It 
would  be  much  better  if  the  statistician  would  translate  such 
statements  into  terms  that  are  meaningful  to  the  nonstatis- 
tician.  In  addition,  the  statistician  should  not  overlook  the 
value  of  presenting  the  results  in  graphical  form.  As  a  matter 
of  fact,  he  should  always  consider  plotting  the  data  as  a  pre 
liminary  step  to  a  more  analytical  approach. 

(2)  Many  statistical  designs,  especially  when  first  formulated,  are 
criticized  as  being  too  expensive,  complicated,  or  time-con 
suming.  Such  criticisms,  when  valid,  must  be  accepted  in  good 
grace  and  an  honest  attempt  made  to  improve  the  situation, 
provided  that  the  solution  of  the  problem  is  not  compromised. 

Before  terminating  our  discussion  of  the  advantages  and  disadvan 
tages  of  statistically  designed  experiments,  some  mention  should  be 
made  of  particular  advantages  and  disadvantages  associated  with  fac 
torials.  This  is  deemed  necessary  because  of  the  important  role  that 
factorials  play  in  the  design  and  analysis  of  experiments.  (NOTE: 
There  will  undoubtedly  be  some  overlap  between  the  advantages  and 
disadvantages  given  for  statistically  designed  experiments  in  general 
and  those  about  to  be  given  for  factorials.  However,  a  small  amount 
of  repetition  will  not  be  harmful.) 

Advantages  of  Factorials 

(1)  Greater  efficiency  in  the  use   of  available  experimental  re 
sources  is  achieved. 

(2)  Information  is  obtained  about  the  various  interactions. 

(3)  The  experimental  results  are  applicable  over  a  wider  range  of 
conditions;  that  is7  due  to  the  combining  of  the  various  fac 
tors  in  one  experiment,  the  results  are  of  a  more  comprehen 
sive  nature. 

(4)  There  is  a  gain  due  to  the  hidden  replication  arising  from  the 
factorial  arrangement. 

Disadvantages  of  Factorials 

(1)  The  experimental  setup  and  the  resulting  statistical  analysis 
are  more  complex. 

(2)  With  a  large  number  of  treatment  combinations  the  selection 
of  homogeneous  experimental  units  becomes  more  difficult. 

(3)  Certain  of  the  treatment  combinations  may  be  of  little  or  no 
interest;   consequently,   some  of  the  experimental  resources 
may  be  wasted. 
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10.19     Summary 

In  this  chapter  several  extremely  important  topics  have  been  dis 
cussed.  It  is  recommended  that  they  be  re-examined  from  time  to  time 
as  the  reader  progresses  through  the  succeeding  chapters.  Such  a  peri 
odic  reappraisal  "will  prove  beneficial  for  a  number  of  reasons,  for  ex 
ample:  (1)  a  thorough  understanding  of  the  concepts,  principles,  and 
techniques  involved  is  essential  for  a  fruitful  study  of  the  experi 
mental  designs  which  are  the  subjects  of  the  next  three  chapters,  and 
(2)  an  appreciation  of  these  important  principles  should  manifest  itself 
in  improved  experimentation. 

Problems 

10.1  Choosing   practical   situations   from   your   special  field   of  interest, 
describe  three  problems  whose  solutions  must  be  determined  experi 
mentally. 

10.2  With  reference  to  Problem  10.1,  discuss  the  need  for  an  experimental 
design  in  each  of  the  three  illustrations. 

10.3  It  is  sometimes  said  that  experimental  design  is  a  subject  which  con 
sists  of  two   (almost  distinct)   parts:  (a)  the  choice  of  treatments, 
experimental  units,  and  characteristics  to  be  observed;  (b)  the  choice 
of  the  number  of  experimental  units  and  the  method  of  assigning  the 
treatments  to  the  experimental  units.  Discuss  this  classification  from 
the  points  of  view  of  the  researcher  and  the  statistician. 

10.4  Define  "systematic  error"  and  discuss  the  relationship  between  this 
factor  and  the  statistical  design  of  experiments. 

10.5  Some    terms    that    occur   rather    frequently    in    the    literature    are: 
(a)  accuracy,  (b)  precision,  (c)  validity,  (d)  reliability,  and  (e)  bias. 
Restricting  your  remarks  to  the  theory  of  statistics  or  to  applica 
tions  of  statistical  methods,  define  and  discuss  each  of  these  terms. 

10.6  Cox  (14)  uses  "Designs  for  the  Reduction  of  Error"  as  the  title  of 
one  of  his  chapters.  What  does  this  title  suggest  to  you? 

10.7  With  reference  to  factorials,  what  is  meant  by  the  phrase  "hidden 
replication"? 

10.8  Discuss  the  use  of  concomitant  information  in  experimental  design. 

10.9  Choosing  practical  situations  from  your  own  special  field  of  interest, 
illustrate  the  concept  of  confounding.  Give  examples  of:  (a)  unavoid 
able  confounding,  (b)  unintentional  confounding,  and  (c)  intentional 
confounding. 

10.10  Choosing  practical  situations  from  your  own  special  field  of  interest, 
illustrate  the  concept  of  randomization, 

10.11  With  reference  to  Problem   10.10,    discuss  the  difficulties   (if  any) 
associated  with  the  randomization  process. 

10.12  Give  your  interpretation  of  the  phrase  "restricted  randomization." 

10.13  What  would  you  do  if,  in  the  planning  of  a  randomized  complete 
block  design,  the  same  order  of  treatments  occurred  (randomly)  in 
each  block? 

10.14  Discuss  the  following  ways  in  which  treatments  can  be  assigned  to 
experimental  units:  (a)  randomly,  (b)  subjectively,  and  (c)  system 
atically.   Give  illustrations  which   show  the  benefits,   dangers,  and 
difficulties  involved  in  each  of  the  three  approaches. 
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10.15  Cox  (14)  uses  the  phrase  "Randomization  as  a  Device  for  Conceal 
ment"  as  the  heading  of  one  of  the  sections  in  his  book.  Without 
referring  to  his  discussion,  what  do  you  believe  he  has  in  mind? 

10.16  Cox  (14)  makes  a  distinction  between  factors  that  represent  a  treat 
ment  applied  to  the  experimental  units  (treatment  factors)  and  fac 
tors  that  correspond  to  a  classification  of  the  experimental  units  into 
two  or  more  types  (classification  factors).  Give  illustrations  of  each 
of  these  from  situations  in  your  own  special  field  of  interest. 

10.17  The  statement  has  been  made  that  an  uncontrolled  and  unmeasured 
variable  may  be  of  sufficient  importance  to  lead  to  the  conclusion 
that  two  controlled  factors  interact  to  a  significant  degree.  Discuss 
this  idea,  including  all  possible  implications.   What  safeguards  do 
we  have  against  such  a  result  occurring? 

10.18  Cox  (14)  also  states  that  it  is  sometimes  convenient  to  classify  factors 
as  follows:  (a)  specific  qualitative  factors,   (b)   quantitative  factors, 

(c)  ranked  qualitative  factors,  and  (d)  sampled  qualitative  factors. 
How  would  you  define  each  of  these?  Compare  your  ideas  with  those 
expressed  by  Cox. 

10.19  Show  graphically  what  is  meant  by  an  interaction.  Illustrate  your 
ideas  using  the  data  of  Examples  10.8  and  10.9. 

10.20  Explain  the  relationship,  if  any,  between  regression  functions  (i.e., 
response  functions)  and  the  concepts  of  effects  and  interactions. 

10.21  How  would  you  go  about  selecting  the  factors  to  be  investigated  in 
an  experiment?  Illustrate  with  examples  from  your  own  specific  field 
of  interest. 

10.22  Assuming  the  factors  have  been  decided  upon,  how  would  you  go 
about  selecting  the  factor  levels?  Illustrate  with  examples  from  your 
own  specific  field  of  interest. 

10.23  Choosing  practical  situations  from  your  own  special  field  of  interest, 
illustrate  completely  randomized,  randomized  complete  block,  and 
split  plot  designs. 

10.24  Indicate  how  the  examples  provided  in  answer  to   Problem  10.23 
attempted  to  "control  error/' 

10.25  What  is  meant  by  the  precision  of  an  experiment?  of  a  contrast? 

10.26  What  is  meant  by  a  sequential  experiment?  Is  there  any  other  kind? 
Please  discuss. 

10.27  In  Problem  10.3,  reference  was  made  to  £i  .  .  .  the  choice  of  treat 
ments,  experimental  units,  and  characteristics  to  be  observed."  Illus 
trate  each  of  these  with  examples  from  your  own  special  field  of 
interest. 

10.28  Discuss  the  following  items  relative  to  the  selection  of  experimental 
units:   (a)   number  of  units,    (b)   size   of  units,    (c)    shape   of  units, 

(d)  independence  of  units. 

10.29  What  is  meant  by  a  "control"  treatment? 

10.30  Cox  (14)  classifies  observations  into  six  groups:  (a)  primary  observa 
tions,  (b)  substitute  primary  observations,  (c)  explanatory  observa 
tions,     (d)     supplementary    observations    for    increasing    precision, 

(e)  supplementary    observations    for    detecting    interactions,     and 

(f)  observations   for   checking   the   application   of  the   treatments. 
Please  try  to  define  and  illustrate  each  of  these.  Then  compare  your 
ideas  with  those  expressed  by  Cox. 
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10.31  Building  on  the  samples  given  in  Section  10.16,  construct  your  own 
list  of  " steps  in  designing  an  experiment." 

10.32  Contrast  the  one-factor-at-a-time  method  of  experimentation  with 
the  factorial  approach.  Construct  a  table  which  shows  and  compares 
the  advantages  and  disadvantages  of  each. 

10.33  Define:  (a)  absolute  experiments  and  (b)  comparative  experiments. 
Give  examples  of  each.  With  which  type  is  this  book  mainly  con 
cerned? 

10.34  Consider  the  following  ''elements"  of  experimental  method: 

(a)   control,  or  the  elimination  of  the  effects  of  extraneous  variables 
(6)    accuracy  of  instruments  and  data  acquisition 

(c)  reduction  of  the  number  of  variables  to  be  investigated 

(d)  planning  of  the  test  sequence  in  advance  of  the  start  of  experi 
mentation 

(e)  detection  of  malfunctions 

(/)    testing  for  reasonableness  of  results 

{g}    analysis  and  interpretation  of  results 

Evaluate  the  foregoing  list  by  comparing  it  with  the  ideas  expressed 

in  this  chapter. 

10.35  Choosing  practical  situations  from  your  own  special  field  of  interest, 
give  three  examples  of  statistically  designed  experiments.  For  each 
of  these,  point  out  where  and  how  the  concepts  of  this  chapter  were 
employed. 

10.36  Choose  a  practical  problem  in  your  own  area  of  specialization.  Fol 
lowing,  where  practicable,  the  philosophy  expressed  in  this  chapter, 
design  an  experiment  to  provide  data  relevant  to  the  problem.  Justify 
all  of  your  decisions  and  relate  them  to  the  discussion  in  the  text.  If 
feasible,  perform  the  experiment  and  then  analyze  and  interpret  the 
results. 
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CH  APTE  R    11 

COMPLETELY  RANDOMIZED  DESIGN 


CHAPTER  10,  several  experimental  designs  were  illustrated.  In  the 
present  chapter,  we  propose  to  discuss  the  simplest  of  these  designs, 
namely,  the  completely  randomized  design,  in  considerable  detail. 
Much  attention  will,  of  course,  be  given  to  methods  of  analyzing  data 
arising  from  such  a  design,  and  it  will  be  observed  that  analysis  of  vari 
ance  (frequently  abbreviated  as  AOV  or  ANOVA)  is  the  method  most 
widely  used. 

11.1      DEFINITION     OF     A     COMPLETELY     RANDOMIZED 
DESIGN 

A  completely  randomized  (CR)  design  is  a  design  in  which  the  treat 
ments  are  assigned  completely  at  random  to  the  experimental  units,  or 
vice  versa.  That  is,  it  is  a  design  that  imposes  no  restrictions,  such  as 
blocking,  on  the  allocation  of  the  treatments  to  the  experimental  units. 
Of  course,  as  in  Examples  10.13  and  10.14,  some  degree  of  balance  may 
be  sought. 

Because  of  its  simplicity,  the  completely  randomized  design  is  widely 
used.  However,  the  researcher  is  cautioned  that  its  use  should  be 
restricted  to  those  cases  in  which  homogeneous  experimental  units  are 
available*  If  such  units  cannot  be  obtained,  some  blocking  should  be 
utilized  to  increase  the  efficiency  of  the  design. 

Example  11.1 

Given  four  fertilizers,  we  wish  to  test  the  null  hypothesis  that  there 
are  no  differences  among  the  effects  of  these  fertilizers  on  the  yield  of 
corn,  We  shall  assume  there  are  20  experimental  plots  available  to  the 
research  worker*  A  sound  procedure  would  be  to  place  each  fertilizer  on 
an  equal  number  of  experimental  plots  so  that  our  estimates  of  the 
mean  effect  of  each  fertilizer  will  have  equal  weight.  Then,  we  insist 
that  the  fertilizers  be  assigned  to  the  plots  at  random.  This  may  be 
accomplished  by  numbering  our  plots  from  1  through  20  and  then 
drawing  tickets  at  random  from  a  hat,  5  tickets  being  identified  by 
coloring  or  code  mark  with  each  of  the  4  fertilizers.  The  first  one  drawn 
specifies  the  treatment  for  plot  No.  1,  the  second  for  plot  No.  2,  and 
so  on, 

Example  11.2 

If,  in  the  preceding  example,  only  17  plots  were  available,  some  lack 
of  balance  would  be  inevitable.  Assuming  that  more  precise  information 
is  desired  on  fertilizer  No.  1,  the  randomization  procedure  could  be 
modified  so  that,  for  example,  8  plots  would  be  treated  with  fertilizer 
No.  1,  3  plots  with  No.  2,  3  with  No.  3,  and  3  with  No.  4. 

t2781 
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11.2      COMPLETELY     RANDOMIZED    DESIGN    WITH     ONE 
OBSERVATION    PER    EXPERIMENTAL   UNIT 

If,  in  a  completely  randomized  design,  rit  experimental  units  were 
subjected  to  the  £th  treatment  (^=1,  •  -  -  }  t)  and  only  one  observation 
per  experimental  unit  was  obtained,  the  data  would  appear  as  in  Table 
11.1. 

TABLE   11.1— Symbolic    Representation  of  Data  in  a  Completely  Random 
ized  Design  (Unequal  Numbers  of  Observations  for  Each  Treatment) 


Treatment 

Total 

1 

2 

t 

Observations 

F» 

F21 

?: 

Totals 

r, 

r, 

r* 

t 
i—  i 

Numbers  of  observations 
Means 

T. 

* 

n* 
7, 

1-1 
F  -  T  /  iZ  nt 

I         £-1 

Using  the  equations: 

\  Y2  =  total  sum  of  squares 

=  sum  of  the  squares  of  all  the  observations 

=  i:  y:  F« 

S     j     /    j      •*-    131 


=  sum  of  squares  due  to  the  mean 

=  T2  /  i:  nt, 

'        1=1 

=  among  treatments  sum  of  squares 


(11.1) 


(11-2) 


(11-3) 


and 


=  experimental  error  sum  of  squares 
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TABLE    11.2-ANOVA  for  Data  of  Table  11.1 


Source  of  Variation 

Degrees  of  Freedom 

Sum  of  Squares 

Mean  Square* 

IVEean  

1 

J\ftsu 

M 

Among  treatments     .    . 

t  —  1 

T 

T 

Experimental     error     (within 
treatments)  

i:  («*-« 

i*«i 

EVV 

E 

Total 

t 

~y  ]  m 

y;  r* 

i—  1 

*  The  mean  squares  are  found  by  dividing  each  sum  of  squares  by  the  corresponding 
degrees  of  freedom.  To  avoid  confusion  with  symbols  for  effects  and  interactions  (see 
Chapter  10),  the  symbols  for  mean  squares  will  always  be  set  in  boldface  type.  This  pro 
cedure  will  be  adhered  to  throughout  the  remainder  of  this  book. 


(11.4) 


TUT         

-LYLyjf 


the  ANOVA  shown  in  Table  11.2  is  obtained.  If  each  n»  =  r&,  that  is, 
if  the  number  of  experimental  units  per  treatment  is  the  same  for  all 
treatments,  Table  11.1  would  be  modified  as  shown  in  Table  11.3. 
Equations  (11.1)  through  (11.4)  would  be  rewritten  as: 

t  n 

YU,  (11-5) 

(11.6) 

TABLE    11.3— Symbolic  Representation  of  Data  in  a  Completely  Random 
ized  Design  (Equal  Numbers  of  Observations  for  Each  Treatment) 


Treatment 

1 

2 

.  .  . 

t 

Total 

F" 

F2i 

Ytl 

Observations 

F12 

f22 

f'2 

h. 

h. 

Ytn 

t 

Totals 

TI 

Tz 

Tt 

—     ^>  *   T- 

—      /  -  •*•  * 

i—  i 

Numbers  of  observations 

n 

n 

n 

tn 

Means 

^ 

F2 

Yt 

7--T/*» 
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and 

The  resulting  ANOVA  is  shown  in  Table  11.4. 

TABLE  11.4-ANOVA  for  Data  of  Table  11.3 


(11-7) 
(11-8) 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

JMean              

1 

Mint 

M 

Among  treatments    .        

t—  1 

T 

T 

Experimental  error  (within  treat 
ments)    

t(n—  1) 

J&tnj 

E 

Total 

tn 

T!  Y2 

Up  to  this  point,  our  discussion  of  a  completely  randomized  design 
with  one  observation  per  experimental  unit  has  concentrated  on  the 
calculation  of  the  various  sums  of  squares  and  mean  squares,  and  on 
the  specification  of  the  associated  degrees  of  freedom.  While  the  calcu 
lation  of  the  sums  of  squares  and  mean  squares  has  been  explained  in 
detail,  no  explanation  has  been  given  as  to  why  the  degrees  of  freedom 
are  as  stated.  However,  the  way  in  which  the  degrees  of  freedom  are 
found  seems  reasonably  clear.  Since  the  procedure  will  be  illustrated 
many  times  in  this  and  succeeding  chapters,  no  attempt  will  be  made 
to  formulate  and  state  general  rules. 

Before  the  preceding  analyses  of  variance  can  be  used  for  purposes 
of  statistical  inference,  certain  assumptions  must  be  made  about  the 
observations.  The  nature  of  these  assumptions  will  now  be  examined. 
(NOTE:  In  general,  the  assumptions  underlying  analyses  of  variance 
are  the  same  as  those  usually  associated  with  regression  analyses. 
These  are  additivity,  linearity,  normality,  independence,  and  homo 
geneous  variances.  That  is,  the  statistical  model  most  frequently  assumed 
in  analysis  of  variance  applications  is  a  linear  model  to  which  has  "been 
appended  certain  restrictions  about  independent  observations  from  normal 
distributions.) 

The  basic  assumption  for  a  completely  randomized  design  with  one 
observation  per  experimental  unit  is  that  the  observations  may  be 
represented  by  the  linear  statistical  model 


=   JUL 


y  =  1,  •  •  •  ,  n*  (unequal  numbers) 


or 


(11.9) 


j  =  1,  -  -  •  ,  n    (equal  numbers) 
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where  /x  is  the  true  mean  effect,  n  is  the  true  effect  of  the  ith.  treatment, 
and  €tj  is  the  true  effect  of  the  yth  experimental  unit  subjected  to  the 
ith  treatment.  (NOTE :  e^  also  includes  the  effects  of  all  other  extra 
neous  factors.  However,  we  rely  on  the  process  of  randomization  to 
prevent  these  effects  from  contaminating  our  results.)  In  addition,  it 
is  customarily  assumed  that  M  is  a  constant  while  the  e»/  are  NID  (0,,  00. 

However,  the  specification  of  the  model  is  still  incomplete,  for 
nothing  has  been  said  about  the  T,-.  The  researcher  has  two  choices  as 
to  what  he  can  say  about  the  rt-,  namely:  (1)  2^i=i  rt  =  0,  which  reflects 
the  researcher's  decision  that  he  is  concerned  only  with  the  t  treat 
ments  present  in  his  experiment,  or  (2)  the  T*  are  NID  (0,  o-T),  which 
reflects  the  researcher's  decision  that  he  is  concerned  with  a  population 
of  treatments  of  which  only  a  random  sample  (the  t  treatments)  are 
present  in  his  experiment.  These  two  choices  lead  to  what  the  statis 
tician  refers  to  as  Model  I  and  Model  II,  respectively.  Incidentally, 
Model  I  is  sometimes  referred  to  as  the  analysis  of  variance  (fixed 
effects)  model,  while  Model  II  is  known  as  the  component  of  variance 
(random  effects)  model. 

Once  the  foregoing  assumptions  have  been  made,  the  theory  out 
lined  in  Chapter  3  may  be  invoked  to  obtain  "expected  mean  squares." 
These  expected  mean  squares  can  be  of  valuable  assistance  to  the  re 
searcher,  for  they  indicate  the  proper  procedure  to  be  followed  in  esti 
mating  parameters  and/or  testing  hypotheses  about  parameters  within 
the  framework  of  the  assumed  model.  It  is  customary  to  exhibit  these 
expected  mean  squares  in  an  additional  column  in  ANOVA  tables.  So, 
without  further  discussion  at  this  time,  we  re-exhibit  Table  11.2  as 
Table  11.5  (Model  I)  and  Table  11.6  (Model  II)  with  certain  expected 
mean  squares  included.  Similarly,  Table  11.4  is  re-exhibited  as  Table 
11.7  (Model  I)  and  Table  11.8  (Model  II).  (NOTE:  While  formal  deri- 


TABLE   11.5-ANOVA  for  Data  of  Table  11.1  Showing  Certain  Expected 
Mean  Squares  (Unequal  Number  of  Observations  per  Treatment:  Model  I) 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean  Square 

Mean  

1 

M 

M 

Among  treatments  .... 

t  —  1 

1  yy 

T 

-+£vV(«-« 

Experimental  error.  .  . 

Z  frt  -  1) 

i—  1 

Eyy 

E 

"' 

Total 

t 

ZF2 

i—  1 
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TABLE   11.6-ANOVA  for  Data  of  Table  11.1  Showing  Certain  Expected  Mean 
Squares  (Unequal  Numbers  of  Observations  per  Treatment:  Model  II) 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean  Square* 

IVIean 

1 

JWi/t/ 

M 

Among  treatments  

t  —  1 

T 

T 

0.2  _J_  nQ03 

Experimental  error  

]£  (ni  -.  i) 

J~L*f~j 

E 

<r* 

T«l 

Total 

t 

23  **»* 

S  F2 

»—  1 

*  The  constant  no  is  a  sort  of  an  average  nt,  and  it  Is  defined  by 
wo  =  f  X>*  -   i  »?/  Z^l/Cf  -  1). 

l_    x_l  i«l  t=l  -J 


TABLE   11.7-ANOVA  for  Data  of  Table  11.3  Showing  Certain  Expected  Mean 
Squares  (Equal  Numbers  of  Observations  per  Treatment:  Model  I) 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean  Square 

Mean 

1 

M 

M 

Among  treatments.  .  .  . 
Experimental  error.  .  . 

t  —  1 
t(n  -  1) 

Ryy 

T 
E 

t 

»—  i 
<r2 

Total 

in 

S  F2 

TABLE   11.8-ANOVA  for  Data  of  Table  11.3  Showing  Certain  Expected  Mean 
Squares  (Equal  Numbers  of  Observations  per  Treatment:  Model  II) 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean    Square 

IS/Iean 

1 

Mm 

M 

Among  treatments              .     .  . 

t—l 

T 

w 

T 

<r2-t-^cr? 

Experimental  error      

t(n—V) 

Eiyy 

E 

o-2 

Total 

tw> 

TZ  F2 
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vations  of  expected  mean  squares  will  not  be  given  in  this  book,  certain 
rules  will  be  given  to  aid  the  researcher  in  finding  these  valuable  quan- 
titles.  Until  these  rules  are  expounded,  the  reader  is  asked  to  accept 
the  results  as  given.) 

Having  performed  the  previously  indicated  calculations  and  having 
determined  certain  expected  mean  squares  (based  on  the  specified  as 
sumptions),  we  are  now  ready  to  proceed  to  the  making  of  statistical 
inferences.  Just  what  types  of  inference  will  be  made  will,  of  course, 
depend  on  the  purpose  for  which  the  experiment  was  conducted.  Three 
common  inferences  concern  themselves  with  the  following  problems: 
(1)  hypotheses  about  the  relative  effects  of  treatments,  (2)  estimation 
of  the  magnitude  of  components  of  variance,  and  (3)  estimation  of  the 
mean  effects  of  individual  treatments.  Each  of  these  will  now  be  con 
sidered. 

Consider  first  the  hypothesis  of  "no  differences  among  the  effects  of 
the  t  treatments  in  the  experiment.77  The  way  in  w-hich  this  hypothesis 
was  phrased  indicates  that  Model  I  has  been  assumed.  Thus,  the  hy 
pothesis  may  be  expressed  as  H:T^  =  O  (i=l,  •  •  •  ,  f).  Examination  of 
the  expected  mean  squares  in  Tables  11.5  and  11.7  indicates  (in  each 
case)  that,  if  H  is  true,  both  the  experimental  error  mean  square  and  the 
among  treatments  mean  square  are  estimates  of  <r2.  Thus,  if  H  is  true, 
the  ratio 

mean  square  for  treatments 

T/E  =  -  -  --  (11.10) 

experimental  error  mean  square 

is  distributed  as  F  with 

and     i>2 


degrees  of  freedom  because  of  the  assumption  that  the  e»/  are 
NID  (0,  <r).  If  the  value  of  F  specified  by  Equation  (11.10)  exceeds 
jFa-aOOiJ  K2>,  where  lOOo:  per  cent  is  the  chosen  significance  level,  H  will 
be  rejected  and  the  conclusion  reached  that  there  are  significant  dif 
ferences  among  the  t  treatments. 

Had  Model  II  been  assumed,  that  is,  had  the  hypothesis  been 
phrased  as  follows:  "There  are  no  differences  among  the  effects  of  all 
the  treatments  in  the  population  from  which  the  t  treatments  in  the 
experiment  are  a  sample,"  the  same  test  procedure  would  have  evolved. 
That  is,  under  Model  II,  the  hypothesis  H:o*  =  Q  would  also  be  tested 
by  forming  the  ratio  F  =  T/E.  Why,  then,  have  we  been  so  concerned 
over  the  distinction  between  the  two  models?  There  are  two  reasons 
for  our  concern  over  the  differences  in  assumptions  between  Models  I 
and  II.  These  are:  (1)  the  inferences  in  the  two  cases  are  about  entirely 
different  populations;  and  (2)  in  more  complex  analyses,  quite  dif 
ferent  test  procedures  may  be  indicated.  Many  illustrations  of  these 
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differences  will  be  forthcoming  in  later  sections  of  this  and  succeeding 
chapters. 

Consider  next  the  estimation  of  components  of  variance.  Regardless 
of  which  model  is  assumed  (that  is,  Model  I  or  Model  II),  it  is  clear 
that 

s2  —  the  experimental  error  mean  square  ==  E  (11.11) 

is  an  estimate  of  o-2.  However,  if  Model  II  is  assumed,  it  is  also  possible 
to  estimate  ov  by  calculating 

2        (mean  square  for  treatments)  —  (experimental  error  mean  square) 


coefficient  of  <r*  in  the  expected  mean  square  for  treatments 
(T  —  E)/no,      for  unequal  numbers 
(T  —  E)/n,       for  equal  numbers. 


Finally,  let  us  consider  the  estimation  of  the  mean  effects  of  indi 
vidual  treatments.  It  should  be  obvious  that  a  point  estimate  of  the 
true  mean  effect  of  the  ith  treatment  (Mt  =  M+ri)  is  given  by  T\-.  How 
ever,  since  confidence  interval  estimates  are  desired,  it  is  necessary  that 
we  determine  the  standard  error  of  the  treatment  mean.  In  Section  6.5, 
the  estimated  variance  of  a  sample  mean, 


was  used  to  define  the  standard  error  of  the  mean 

.  (11  .  14) 


Consequently,  the  estimated  variance  of  the  mean  of  the  ith  treatment  in 
a  completely  randomized  design  with  one  observation  per  experi 
mental  unit  is  given  by 

^   _  experimental  error  mean  square 


number  of  observations  in  ith.  treatment         (11. 15) 
=  E/m  =  s*/m 
and  the  standard  error  of  the  mean  of  the  ith  treatment  is  given  by 

(11.16) 


Of  course,  if  each  ?iv  =  n,  the  same  standard  error  would  be  attached  to 
each  sample  mean.  A  lOOy  per  cent  confidence  interval  estimate  of  M* 
would  then  be  determined  by  calculating 


_ 

=    Y*   =F 

where  v  is  the  number  of  degrees  of  freedom  associated  with  E. 
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Considerable  time  has  been  spent  in  discussing  the  analysis  of  a 
completely  randomized  design  with  one  observation  per  experimental 
unit.  For  example,  calculations  were  explained  in  detail,  assumptions 
were  carefully  stated,  expected  mean  squares  were  introduced,  and 
test  and  estimation  methods  were  developed  with  care.  Such  attention 
to  detail  was  deemed  appropriate  for  an  orderly  development  of  the 
methods  involved.  Further,  the  discussion  given  here  will  greatly  expe 
dite  the  future  presentation  of  similar  methods  for  more  complex  situa 
tions.  Some  examples  will  now  be  given. 

Example  11 .3 

Consider  that  an  experiment  similar  to  that  described  in  Example 
10.13  has  been  performed.  However,  only  four  treatments  were  investi 
gated  and  only  20  batteries  were  available  for  testing.  The  data  in 
Table  11.9  resulted.  Following  Equations  (11.5)  through  (11.8),  the 
appropriate  calculations  are: 

]T  y*  =  104,352 
Myv  =  (1444)  2/20  =  104,256,8 
Tyy  =-  [(369)2  4-  (371)2  +  (345)2  4-  (359)2]/5  —  104,256.8 

=  84.8 
Eyy  =  104,352  —  104,256.8  —  84.8  =  10.4. 

These  lead  to  the  ANOVA  shown  in  Table  11.10.  The  expected  mean 
square  for  treatments  has  been  given  for  both  Model  I  and  Model  II. 
Since  F  =  43.49  >-PT.9»cstie>  =  5,29,  the  hypothesis  ffrr^O  (i=l,  2,  3,  4) 
or  Hia^  —  Q,  whichever  applies,  is  rejected.  Since  the  number  of  obser 
vations  per  treatment  is  the  same  for  each  treatment,  the  standard  error 
of  a  treatment  mean  is  -\/Q. 65/5  =  -\/0- 13  =  0.36  second. 


TABLE  1 1 .9-Activated  Lives  of  Twenty  Thermal  Batteries  Resulting  From 
Experiment  Described  in  Example  11.3 


Treatment 

1 

2 

3 

4 

Total 

73 

74 

68 

71 

73 

74 

69 

71 

Observations  (in  seconds) 

73 

74 

69 

72 

75 

74 

69 

72 

75 

75 

70 

73 

Totals 

369 

371 

345 

359 

1,444 

Numbers  of  observations 

5 

5 

5 

5 

20 

Means 

73.8 

74.2 

69.0 

71.8 

72.2 
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TABLE   11.10-ANOVA  for  Data  of  Table  11.9 
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Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean  Square 

F- 

Ratio 

Mean    

1 

104,256.8 

104,256.8 

Treatments   . 

3 

84.8 

28.27 

U  +  (s/3)  i:  A 

i—  1 

43.49 

Experimental  error 

16 

10.4 

0.65 

or 
«*  +  Sa-r 
cr2 

Total 

20 

104,352.0 

Example  11 .4 

Consider  an  experiment  to  study  the  effect  of  storage  condition  on 
the  moisture  content  of  white  pine  lumber.  Five  storage  methods  were 
investigated,  with  varying  numbers  of  experimental  units  (sample 
boards)  being  stored  under  each  condition.  The  data  in  Table  11.11 
were  obtained.  Following  Equations  (11.1)  through  (11.4),  the  appropri 
ate  calculations  are : 


=  863.36 

=  (108.8)2/14 


845.53 


•  + 


M*        L      5         '         3         '         2         l         3 
Eyy  =  863.36  -  845.53  —  10.66  =  7.17, 


t    (27.4)'    t    (7.1)' 


—  845.53  =  10.66 


TABLE   11.11-Moisture  Contents  of  Fourteen  White  Pine  Boards  Stored 

Under  Different  Conditions 


Storage  Conditions 

Total 

1 

2 

3 

4 

5 

Observations  (in  per 
cent)         

7.3 
8.3 
7.6 
8.4 
8.3 

5.4 
7.4 
7.1 

8.1 
6.4 

7.9 
9.5 
10.0 

7.1 

Totals 
Number  of  observa 
tions         

39.9 

5 
8.0 
0.4 

19.9 

3 
6.6 
0.5 

14,5 

2 
7,3 
0.6 

27.4 

3 
9.1 
0.5 

7.1 

1 
7.1 
0.9 

108.8 

14 
7.8 

Means  

Standard  errors 
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This  leads  to  the  ANOVA  shown  in  Table  11.12.  Again,  the  expected 
mean  square  for  treatments  is  given  for  both  Models  I  and  II.  Since 
^  =  3.34  <jP.95C4,9)  =  3.63,  we  are  unable  to  reject  the  hypothesis 
HtTi  —  0(£  =  1,  -  -  -  ,  5)  or  flr:cr?  =  0.  The  standard  errors  of  the  treat 
ment  means,  presented  in  Table  11.11  for  convenience,  were  calculated 
using  Equation  (11.16)  where  a2  =  0.80. 

TABLE   11.12-ANOVA  for  Data  of  Table  11.11 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean  Square 

F- 
Ratio 

Mean    ...           ... 

1 

845.53 

845.53 

Storage  conditions  .  . 
Experimental  error  . 

4 
9 

10.66 
7.17 

2.67 
0.80 

o 

5 

er2  -f-     ^  ^  WiTi/4 

i—  1 

i?r 

3.34 

Total 

14 

863.36 

THE    RELATION    BETWEEN    A    COMPLETELY    RAN 
DOMIZED    DESIGN    AND    "STUDENT'S"    *-TEST   OF 
H:jui  =  M2  VERSUS 


11.3 


In  Section  7.20  it  was  mentioned  that  the  analysis  of  variance  tech 
nique  could  be  used  as  an  alternative  to  "Student's"  £-test  when 
examining  the  hypothesis  flr:^i  =  /x2.  Clearly,  this  same  relationship 
exists  when  we  have  a  completely  randomized  design  involving  only 
two  treatments.  In  this  instance,  the  hypothesis  (under  Model  I)  of 
H  :ri  =  T3  =  0  is  equivalent  to  £T:^i  =  ^2  where  MI 


11.4     SUBSAMPLING    IN    A   COMPLETELY    RANDOMIZED 
DESIGN 

In  many  experimental  situations,  several  observations  may  be  ob 
tained  on  each  experimental  unit.  If  these  observations  are  all  on  the 
same  characteristic  (i.e..,  on  the  same  variable),  the  process  of  obtain 
ing  the  observations  is  often  referred  to  as  subsampling.  Some  examples 
of  subsampling  are: 

(1)  In  the  battery  experiment  of  Example  11.3,  several  observa 
tions  per  battery  might  have  been  obtained  by  connecting 
several  clocks  to  each  battery.  These  several  observations  per 
battery  would  be  referred  to  as  "samples  within  experimental 
units." 

(2)  In  a  field  experiment,  the  researcher  may  not  have  time  to 
harvest  (totally)  each  experimental  plot.  Thus,  he  might  ran- 
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domly  select  several  quadrats  per  plot  and  harvest  tlie  grain 
in  each  selected  quadrat.  Again,  we  would  describe  these  ob 
servations  as  "samples  within  experimental  units. " 
(3)  In  a  food  technology  experiment  involving  the  storage  of 
frozen  strawberries,  10  pints  (experimental  units)  were  stored 
at  each  of  5  lengths  of  storage  time  (treatments).  When 
ascorbic  acid  determinations  were  made  after  storage,  two 
determinations  were  made  on  each  pint  (samples  within  exper 
imental  units). 

As  you  can  well  imagine,  the  addition  of  subsampling  to  the  experi 
mental  program  will  have  an  effect  on  the  eventual  analysis.  First,  let 
us  see  what  changes  are  required  in  the  assumed  statistical  model. 
Under  conditions  such  as  have  been  described  above,  the  appropriate 
model  is 

Ytjk  ~  p.  +  Tt  +  €ij  +  17  *y*;  i   =    1,    -    *    •   ,  t 

/=!,••-,»<  (11.18) 


where  /z  is  the  true  mean  effect,  T%-  is  the  true  effect  of  the  ith  treat 
ment,  tij  is  the  true  effect  of  the  jth  experimental  unit  subjected  to  the 
ith  treatment,  and  rj^Jk  is  the  true  effect  of  the  &th  sample  taken  from 
the  jth  experimental  unit  subjected  to  the  zth  treatment.  Proceeding 
as  before,  we  assume  that  /z  is  a  constant,  that  the  «</  are  NID  (0,  <r), 
and  that  the  17^  are  NID  (0,  o-,,).  Of  course,  this  still  leaves  the  nature 
of  the  r<  unspecified.  That  is,  do  we  assume  Model  I  or  II?  This  deci 
sion  will  depend  on  the  manner  in  which  the  treatments  involved  in  the 
experiment  were  selected.  (NOTE:  In  most  experimental  situations, 
Model  I  is  appropriate  because  the  researcher  generally  selects  his 
treatments  in  a  nonrandom  fashion  and  is  only  interested  in  making 
inferences  about  the  treatments  actually  present  in  the  experiment. 
However,  since  Model  II  better  fits  some  situations,  we  will  con 
tinue  to  give  it  consideration.) 

In  order  to  simplify  as  much  as  possible  the  presentation  of  the 
method  of  calculating  the  various  sums  of  squares,  let  us  adopt  the 
following  notation: 

jy  =  total  number  of  observations  in  the  whole  experiment 

t       m 


JE»y  =  total  of  the  ntj  observations  on  the^'th  experimental  unit  sub 
jected  to  the  ith  treatment 


-2 

fc=*l 
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nj 

Ti  =  total  of  all  the   ^2/  ni5  observations  on  the  ith  treatment 


">          * 

=  z  2:  YW  = 


=  total  of  all  N  observations 

t         nj      njj  t         nj 


«  =  z:  r*. 

i=i  y«i  jfc=i  t^i  j=i  i=i 

Using  the  preceding  notation,  the  various  sums  of  squares  are  found  as 
shown  below. 

=  total  sum  of  squares 


-34"i/j/  =  sum  of  squares  due  to  the  mean 

H  (11.20) 

=  T*/N9 
Tyy  =  among  treatments  sum  of  squares 

-^ 

(11.21; 


Eyy  =  experimental  error  sum  of  squares 


g       /      o        /       n* 

^)  -  z:  (  r?  /  2:  ^ 

*=i  V      '      y=i 

rw,  (11.22) 


and 

^y  =  pooled  sum  of  squares  among  samples  on  the  same 

experimental  unit  (11.23) 

-     -2WW     -      TW     -     Eyy 
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"where 
Y  '  ij 


average  of  the  n^  observations  on  the  /th  experimental  unit 
subjected  to  the  ith  treatment 


Y  i  =  average  of  all  observations  on  the  fth  treatment 


and 

7  —  average  of  all  observations  in  the  whole  experiment 
=   T/N. 

TABLE   11.13-Generalized    ANOVA    for    a    Completely    Randomized    Design 
With  Subsampling  (Unequal  Numbers:  Model  I) 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean  Square 

JVtean 

1 

J^flJ-H 

M 

Treatments  .  .  . 

Experimental 
error 

*—  1 

i:  (**-D 

t««»i 

Tyy 

E-vv 

T 
R 

'l  +  C.<r*+  I^Ci:    nyVrV(*-l) 
i—  1    V  j-1              / 

«vK«* 

Sampling  error 

£  ib  (»»—D 

»—  i    y—  i 

Syy 

S 

2 

0-,, 

Total 

2V 

J2  Y* 

These  results  would  then  be  presented  as  in  Table  11.13  in  which  the 
constants  Ci  and  c2  are  defined  by 


(11.24) 


z:  c»*  - 


and 


t        /     rti  f       n<£  \  t         nf 

23  (  S)  ^?j  /  2D  ^*y )  —  X)  2D 

1=1  \  y=i     '     /— i     /     i=»i  j— i 


N 


(11.25) 
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Had  Model  II  been  assumed,  the  ANOVA  presented  in  Table  11.13 
would  be  exactly  the  same  except  for  the  "expected  mean  square  for 
treatments/'  which  would  appear  as  0^+c2<r2+c3ov  where 


N  — 


N 


(11.26) 


Example  11.5 

Consider  an  experiment  to  investigate  the  fermentative  conversion  of 
sugar  to  lactic  acid.  We  wish  to  compare  the  abilities  of  two  micro 
organisms  to  carry  out  this  conversion.  A  quantity  of  substrate  is  pre 
pared  and  divided  into  two  unequal  portions.  Each  portion  is  then 
divided  into  a  number  of  100  ml.  subportions  (experimental  units)  as 
follows:  No.  1,  4  units;  No.  2,  3  units.  Each  of  the  100  ml.  units  is 
Inoculated  with  one  or  the  other  of  the  two  microorganisms,  the  4  units 
being  Inoculated  with  microorganism  No.  1  and  the  3  units  with 
microorganism  No.  2.  The  fermentation  is  allowed  to  proceed  for  24 
hours,  and  then  each  experimental  unit  (100  ml.  subportion)  is  ex 
amined  for  the  amount  of  residual  sugar,  expressed  as  mg.  per  5  cc., 
to  determine  the  amount  of  change  produced  by  each  microorganism, 
the  converted  sugar  having  been  shown  previously  to  occur  as  lactic 
acid.  Varying  numbers  of  determinations  are  made  on  each  sample. 
The  data  are  recorded  in  Table  11.14. 


TABLE   11. 14- Amount  of   Unconverted   Sugar  in  the   Substrate  Following  a 

24-Hour  Fermentation  Due  to  Two  Different  Microorganisms 

(Coded  data  for  easy  calculation) 


Determi 
nations 

Microorganism  No.  1 

Microorganism  No.  2 

Sample  number 

Sample  number 

1 

2 

3 

4 

1 

2 

3 

1 

5.6 
5.7 

5.0 
5.0 
5.1 

5.4 
5.4 

5.4 
5.5 
5.4 

5.3 
5.5 

7.6 
7.6 

7.8 

7.4 
7.0 

7.2 

7.5 
7.6 

7.5 
7.4 

2  

3..    .     . 

4  

5  

Sums 
»«>• 

11.3 

2 

15.1 
3 

27.1 

5 

10.8 
2 

23.0 
3 

21.6 
3 

30.0 

4 

Following  the  calculational  procedure  outlined,  we  obtain: 


—  3 

=  5 


i=      4 

2=      3 

=    22 

=      2 
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1=  3 
===  3 


.#22=21.6 


rx=    64.3 

r2  =    74.6 

T=138.9 


and  hence 
]T  F2  =  902.07 
Myy  =  (138.9)2/22  =  876.9641 
,     (74.6)2 


—  876.9641 

—  24.0927 

(27.1)2    t    (10.8)2        (23.0)2        (21.6)3 

.        -          j         —  j _  j _ 


(30.0) 


yy      L    12      '      10 

=  901.0568  -  876.9641 
=  r(11.3)2        (15. 1)2 
yy  ~~  L      2  3 

—  901.0568 

=  901.9036  —  901.0568  =  0.8468 

Svv  =  902.07  —  876.9641  —  24.0927  —  0.8468  =  0.1664. 
These  results  are  presented  in  ANOVA  form  in  Table  11.15. 


TABLE   11.15-ANOVA  for  Fermentation  Data  of  Table  11.14 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean  Square 

Mean  

1 

876.9641 

876.9641 

Microorganisms  .... 

Experimental  error. 
Sampling  error 

1 

5 
15 

24.0927 

0.8468 
0.1664 

24.0927 

0.1694 
0.0111 

2         j-     ™i.              N  2 

17 

Total 

22 

902.0700 

On  examination  of  the  expected  mean  squares  in  Tables  11.13 
and  11.15,  it  is  seen  that  an  exact  test  of  the  hypothesis  ff:rl-==0 
(i=l,  ••-,£)  is  impossible.  This  unfortunate  circumstance  results 
from  the  fact  that  Ci^c2,  and  this  is  so  because  of  the  unequal  num 
bers  of  samples  per  experimental  unit  and  the  unequal  numbers  of  ex 
perimental  units  per  treatment.  This  result  clearly  attests  to  the  desir 
ability  of  equal  numbers  of  observations  in  the  various  subclasses,  and 
for  this  reason  the  statistician  always  recommends  "equal  frequencies" 
when  he  is  consulted  at  the  design  stage  of  any  research  project. 

What,  then,  can  be  done  in  a  situation  such  as  described  above? 
That  is,  since  unequal  frequencies  are  sometimes  inevitable,  is  there 
any  approximate  test  procedure  that  can  be  used?  There  is.  However, 
discussion  of  this  approximation  will  be  deferred  until  Section  11.7. 

Before  terminating  the  present  discussion  of  subsampling  in  a  com 
pletely  randomized  design,  the  simplifications  associated  with  equal 
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frequencies  will  be  demonstrated.  If,  in  a  completely  randomized 
design,  there  are  t  treatments,  n  experimental  units  per  treatment,  and 
m  samples  per  experimental  unit,  the  appropriate  statistical  model  is 

y  ijk  =  p>  +  r±  +  etj  +  -rjijk}          i  =  1,  •  •  •  ,  t  (11  .  27) 

j  =   1,  .  .  .  ,  n 
k  =   1,  •  -  -  ,  m 

where  all  terms  are  defined  as  before.  The  calculations  are  now  speci 
fied  by 

F2  =  total  sum  of  squares 


nm 


*^~^    rr1   /    *.  KJ- 

y         /    -  /  wwi    -^-~     /I// 

X    y       -^     t/     A/'AAZ'  JKt  yyy 

i=l 

experimental  error  sum  of  squares 

t          n 


t—  1    J=l 

f  13  23 

L  i=i  j=i 


sum  of  squares  due  to  the  mean 

(11.29) 
T*/tnm, 

treatment  sum  of  squares 

—  7)2 

(11.30) 


*-     F;)2 

i/m  -  23  r! 

-  Tm,  (11.31) 


and 

51      —  y^F2  Af      T     E  (11*32) 

where 

771 
f?  X"^        V  /-*    1          00\ 

-CSij    — '      X   ^     *  ijki  \LL.3O) 


23  23  Y^  =  23  Js«,  (H.34) 

J-l   *=-!  3=1 


t          n         rn 
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and 


i  =  Ti/nm, 


Y  =  T/tnm. 
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(11.35) 

(11.36) 
(11.37) 

(11.38) 


TABLE   11.16-Generalized  ANOVA  for  a  Completely  Randomized 

Design  with  Subsamplmg  (Equal  Numbers:  Model  I 

and  Model  II) 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean  Square 

IVIean 

1 

Mvv 

M 

Treatments  

t  —  1 

T 

T 

for*  +in<r*  +  nmj*f  i*/(f  -  1) 

Experimenal  error  .  .  . 
Sampling  error 

t(n  -  1) 
tn(m  —  1) 

Eyy 
g 

E 

s 

or 
[a^  +  m<r2  +  nmer^ 
a*  +  mcrz 
<? 

TJ 

Total 

tnm 

y~L  YZ 

These  sums  of  squares  would  then  be  presented  in  ANOVA  form  as  in 
Table  11.16.  Examination  of  the  expected  mean  squares  in  Table  11.16 
indicates  that,  because  of  the  equal  frequencies,  there  will  be  no  diffi 
culty  in  testing  H:n  =  0  (i=l,  -  -  •  ,  t)  or  H:<r*  =  Q.  In  addition,  the 
components  of  variance  are  easily  estimated  by 

s*  =  S  (11.39) 


and 


=  CE  —  S)/m. 


(11.40) 


And,  finally,  the  standard  error  of  a  treatment  mean  is  given  by 


(11.41) 

(NOTE :  Although  not  explicitly  stated,  it  should  be  clear  that  a-2  and 
<rL  could  also  have  been  estimated  when  unequal   frequencies  occur. 
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This  can  be  seen  by  studying  the  expected  mean  squares  in  Table 
11.13.) 

Example  11.6 

An  agronomist  conducted  a  field  trial  to  compare  the  relative  effects 
of  5  particular  fertilizers  on  the  yield  of  Trebi  barley.  Thirty  homo 
geneous  experimental  plots  were  available  and  6  were  assigned  at 
random  to  each  fertilizer  treatment.  At  harvest  time,  3  sample  quadrats 

TABLE   11.17-Coded  Values  of  Yields  from  Ninety  Sample  Quadrats 

Fertilizer  Treatments 


1 

2 

3 

4 

5 

57 

67 

95 

102 

123 

46 

72 

90 

88 

101 

28 

66 

89 

109 

113 

26 

44 

92 

96 

93 

38 

68 

89 

89 

110 

20 

64 

106 

106 

115 

39 

57 

91 

102 

112 

39 

61 

82 

93 

104 

43 

61 

98 

98 

112 

23 

74 

105 

103 

120 

36 

47 

85 

90 

101 

18 

69 

85 

105 

111 

48 

61 

78 

99 

113 

35 

60 

89 

87 

109 

48 

75 

95 

113 

111 

50 

68 

85 

117 

124 

37 

65 

74 

93 

102 

19 

61 

80 

107 

118 

Under  each  treatment,  the  18  observations  are  arranged  in  six  groups  of  three.  Each 
group  consists  of  the  observed  yields  on  the  three  quadrats  taken  from  a  single  experi 
mental  plot. 

were  taken  (at  random)  from  each  experimental  plot  and  the  yield 
was  obtained  for  each  of  the  90  quadrats.  The  data,  in  coded  form,  are 
given  in  Table  11.17.  Using  Equations  (11.28)  through  (11.32),  we 
obtain : 

J^,Y*  =  646,285 
Myy  =  (7187)V°0  =  573,921.88 

TM  =  [(650)2  -f-  (1140)*  +  (1608)2  +  (1797)2  +  (1992)*]/18  -  573,921.88 
=  639,168.72  —  573,921.88  =  65,246,84 
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Evv  =  L(131)2  +  -  .  -  +  (344)«]/3  -  639,168.72 

=  641,001.67  -  639,168.72  =  1,832.95 
Suv  =  5,283.33  (by  subtraction). 

These  are  summarized  in  Table  11 .18.  It  is  easily  verified  that  F  =  222.47, 
with  z/i  =  4  and  v^  =  25  degrees  of  freedom,  is  highly  significant,  and  thus  the 
hypothesis  H:r^  =  Q(i  =  l,  -  •  -  ,  5)  is  rejected.  (NOTE:  An  experienced 
analyst  could  probably  have  predicted  this  result  on  examination  of  the 
data,  but  the  analysis  and  the  statistical  test  make  the  conclusion  an 
objective  one  rather  than  a  subjective  one.)  In  case  a  confidence  interval 
estimate  of  a  treatment  mean  is  desired,  the  standard  error  of  a  treat- 


TABLE   11.18-ANOVA  for  data  of  Table  11.17 


Source  of 
Variation 

Degrees 
of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean  Square 

F- 
Ratio 

Mean 

1 

573,921.88 

573,921.88 

Fertilizers.  .  . 

Experimental 
error 

4 

25 

65,246.84 
1,832.95 

16,311.71 
73.32 

4  +  3<r*  +   (18/4)  i:   rf 
I—  1 

c?  _!_  3^-2 

222  .47 

SamplinfiT  error 

60 

5  283  33 

88  O6 

2 

O~-n 

vn 

Total 

90 

646,285.00 

ment  mean  is  calculated.  Its  value  is  -\/E/nm  —  -\/(73.32)/18  = 
=  2.02.  It  is  also  clear  that  components  of  variance  may  be  estimated  in 
a  simple  manner.  For  example,  s%  =  88.06.  However,  when  an  estimate 
of  <r2  is  sought,  the  calculations  yield  s2=  (73.32-88.06) /3  <0.  Since 
cr2,  by  definition,  is  positive,  it  is  unreasonable  to  quote  a  negative 
estimate.  Thus,  in  the  present  situation,  the  "best"  estimate  of  cr2  will 
be  taken  to  be  zero,  even  though  this  is  a  biased  estimate.  More  will 
be  said  about  the  implications  of  this  in  a  later  section.  For  the  moment, 
we  shall  be  content  with  observing  that  apparently  the  variation  among 
the  true  effects  of  different  experimental  units  is  small,  and  thus  the 
researcher  might  consider  less  replication  (fewer  experimental  units  per 
treatment)  in  a  future  experiment  of  this  type. 

The  reader  will,  no  doubt,  have  realized  that  the  concept  of  sub- 
sampling  may  be  extended  to  many  stages.  That  is,  we  can  have 
"samples  within  samples  within  samples  .  .  .  ,  "  and  the  resulting 
ANO  VA  would  reflect  such  multi-stage  subsampling  by  partitioning  the 
total  sum  of  squares  into  many  more  parts.  Rather  than  continue  the 
discussion  in  general  terms,  we  shall  rely  on  problems  at  the  end  of  the 
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chapter  to   illustrate  not   only  the  principles  involved,   but  also  the 
mechanics  of  the  appropriate  calculations. 

11.5  EXPECTED  MEAN  SQUARES,  COMPONENTS  OF 
VARIANCE,  VARIANCES  OF  TREATMENT  MEANS, 
AND  RELATIVE  EFFICIENCV 

In  Sections  11.2  and  11.4,  the  reader  was  introduced  to  the  concepts 
of  components  of  variance,  expected  mean  squares,  and  variances  of 
treatment  means.  In  those  sections,  no  reasons  were  given  as  to  why 
the  expected  mean  squares  contained  the  indicated  components  of 
variance  nor  why  the  coefficients  of  the  components  of  variance  were 
as  given.  We  now  propose  to  remove  this  deficiency.  In  addition,  a 
scheme  will  be  proposed  that  permits  the  estimation  of  the  relative 
efficiency  of  different  proposed  designs  involving  various  degrees  of 
subsampling.  The  discussion  will  be  conducted  with  reference  to  Tables 
11.16  and  11.18. 

Reference  to  Tables  11.16  and  11.18  shows  that  the  expected  mean 
square  for  sampling  error  contains  only  one  component  of  variance. 
This  is  so  because  the  only  factor  which  affects  (or  causes  or  produces) 
the  variation  "among  samples  within  experimental  units"  is  the  77^ 
factor.  However,  the  expected  mean  square  for  experimental  error  con 
tains  two  components  of  variance  since  this  source  of  variation  reflects 
the  variation  among  the  means  of  the  samples  taken  from  each  experi 
mental  unit,  and  these  means  will  vary  not  only  because  of  the  varia 
tion  from  experimental  unit  to  experimental  unit,  but  also  because  of 
the  variation  among  the  samples  taken  from  each  experimental  unit. 
To  discuss  the  expected  mean  square  for  treatments,  it  is  appropriate 
to  consider  first  the  sum  of  squares.  The  treatment  sum  of  squares 
reflects  the  variation  among  the  means  of  all  the  observations  (on 
samples)  recorded  for  each  treatment.  Now,  these  means  will  vary 
because  of  three  contributing  factors:  (1)  variation  among  treatments 
(fertilizers),  (2)  variation  among  experimental  units  (plots)  within 
treatments,  and  (3)  variation  among  samples  (quadrats)  within  experi 
mental  units.  Thus,  the  expected  mean  square  involves  three  compo 
nents  of  variance  if  Model  II  is  assumed,  or  two  components  of  vari 
ance  and  one  sum  of  squares  if  Model  I  is  assumed.  (NOTE:  The 
reader  may  verify  the  reasonableness  of  the  foregoing  remarks  by  sub 
stituting  the  assumed  linear  statistical  model  for  Yi3^  in  the  expres 
sions  for  the  various  sums  of  squares.) 

How  were  the  various  coefficients  in  the  expected  mean  squares  de 
termined?  The  coefficient  of  <r*  is  1  (and  thus  not  shown)  because  this 
reflects  the  variation  among  individual  samples.  The  coefficient  of  <rz 
is  m  (m  =  3  in  Table  11.18)  because  there  were  m  observations  (samples) 
per  experimental  unit.  The  coefficient  of  o>  when  Model  II  is  assumed, 
or  °f  s^t-i  T^/(t — 1)  when  Model  I  is  assumed,  is  nm  because  there 
were  nm  observations  (m  samples  on  each  of  n  experimental  units)  per 
treatment.  In  Table  11.18,  n  —  &  and  m  =  3.  We  might  note  that 
another  way  of  expressing  the  justification  of  the  coefficients  described 
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above  is  to  say  that  each  treatment  mean  Is  the  average  of  nm  observa 
tions,  while  each  experimental  unit  mean  is  the  average  of  m  observa 
tions. 

The  estimation  of  the  various  components  of  variance  has  been  well 
illustrated  in  the  preceding  sections.  However,  a  recapitulation  will  be 
made  to  summarize  the  procedure.  Since  S  is  an  unbiased  estimator 
of  o\j,  it  is  reasonable  to  write 

si  =  S.  (11.42) 

Similarly,  E  is  an  unbiased  estimator  of  cr^+mo-2,  and  thus  we  write 

s*  +  ms*  =  E.  (11.43) 

If,  then,  we  combine  Equations  (11.42)  and  (11.43)  as  shown  in  Equa 
tion  (11.44),  an  unbiased  estimator  of  cr*  is  obtained: 


m  in 

Now  that  the  preceding  estimates  are  available,  it  is  possible  to 
determine  (subject  to  sampling  variation,  of  course)  which  factor  is 
contributing  the  most  to  the  observed  variation.  Then,  perhaps,  an 
improvement  can  be  made  in  experimental  technique,  or  the  design 
layout  (configuration)  can  be  changed,  to  better  control  the  variation 
in  future  experiments  of  the  same  type.  To  pursue  this  aspect  of  anal 
ysis,  the  concept  of  "relative  efficiency7'  of  one  design  compared  to 
another  design  of  the  same  type  but  involving  different  numbers  of 
experimental  units  and/or  samples  will  be  investigated. 

Before  such  a  comparison  can  be  made,  a  criterion  for  measuring 
efficiency  must  be  established.  The  criterion  adopted  in  this  book  will 
involve  the  estimated  variance  of  a  treatment  mean.  We  will  say  that 
a  design  which  provides  a  smaller  estimated  variance  of  a  treatment 
mean  than  does  some  other  design  is  the  more  efficient  of  the  two. 

With  reference  to  Table  11.16,  and  in  agreement  with  the  definition 
given  earlier,  the  estimated  variance  of  a  treatment  mean  is 

estimated  variance  of  the  individual  items  contributing 
to  the  mean 

number  of  items  (observations)  averaged  to  get  the  mean 

(11.45) 


nm 

4 


n 


Examination  of  Equation  (11.45)  leads  to  the  following  conclusions: 

(1)   If  the  estimates  of  the  components  of  variance,    s2   and   &*, 
remairt  relatively  constant,  an  increase  in  n  or  m  (or  both) 
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will  result  in  a  smaller  estimated  variance  of  a  treatment  mean. 

(2)  An  increase  in  n  (the  number  of  experimental  units  per  treat 
ment)  will  have  more  of  an  effect  than  an  increase  in  m  (the 
number  of  samples  per  experimental  unit)  in  reducing  F(FZ-). 
This  supports  the  statement  made  in  Section   10.16  to  the 
effect  that  "It  (replication)  enables  us  to  obtain  a  more  pre 
cise  estimate  of  the  mean  effect  of  any  factor.  .  ^ .  ^_ 

(3)  If  either  s2  or  s*  (or  both)  can  be  made  smaller,  F(  F*-)  can  be 
made  smaller.  This  could  be  accomplished  by  choosing  more 
homogeneous  experimental  units  or  by  improving  the  experi 
mental  technique. 

Let  us  now  return  to  the  problem  of  estimating  the  efficiency  of  a 
proposed  design  relative  to  the  design  used.  To  do  this,  we  must  first 
estimate  what  the  variance  of  a  treatment  mean  would  be  if  the  pro 
posed  design  were  used.  Assuming  that:  (1)  the  proposed  design  would 
involve  n'  experimental  units  per  treatment  and  m'  samples  per  experi 
mental  unit  and  (2)  the  estimates  of  <r2  and  a*  would  remain  un 
changed,  the  new  estimated  variance  of  a  treatment  mean  would  be 

2      I  /    2 

y-'(F,)  =      *        m/      •  (11.46) 

nm 

If  F'(FT)  <  F(F»),  the  proposed^design  is  said  to  be  more  efficient  than 
the  present  design;  if  ^7(F»)  >  F(Ft-),  the  proposed  design  is  said  to  be 
less  efficient  than  the  present  design.  Thus,  as  a  measure  of  relative 
efficiency i  we  use  the  ratio  of  F(F»)  and  F'(FV).  If  the  efficiency  of  the 
proposed  (new}  design  relative  to  the  present  (old)  design  is  desired,  one 
calculates  (in  per  cent) 


R.E.  of  new  to  old  =  100[F(  7<)/?'CF*)],  (11.47) 

while  if  the  efficiency  of  the  present  (old")  design  relative  to  the  proposed 
(new}  design  is  desired,  one  calculates  (in  per  cent) 

R.E.  of  old  to  new  =  100  [F^F^/t^F,)].  (11.48) 

Some  texts  use  the  concept  of  "relative  information"  and  it  would  be 
wise  for  us  to  see  what  relationship  this  bears  to  relative  efficiency.  If 
information  is  defined  as  the  reciprocal  of  the  variance,  then  it  is  only  a 
matter  of  simple  algebra  to  show  that  relative  information  is  the  same 
as  relative  efficiency.  For  example, 


R.I.  of  old  to  new  =  -y  —  ^  _        X  100  =  R.E.  of  old  to  new.     (11  .  49) 

Li/  ^  \  Y  i)  j 

Similarly, 

R.I.  of  new  to  old  =  R.E!.  of  new  to  old.  (11.50) 
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It  should  be  noted  that  there  are  other  definitions  of  relative  informa 
tion  to  be  found  in  the  literature  (e.g.,  Yates:  Design  and  Analysis  of 
Factonal  Experiments)  which  differ  from  relative  efficiency.  However, 
if  we  define  our  terms  as  above,  the  two  concepts  may  be  used  inter 
changeably. 

Example  11.7 

The  experiment  on  frozen  strawberries  discussed  in  (3)  in  the  first 
paragraph  of  Section  11.4  was  performed.  However,  all  that  is  available 
is  the  abbreviated  AISTOVA  of  Table  11.19.  The  estimates  of  the  com 
ponents  of  variance  are  4  =  5  and  s2  =  (20  —  5) /2  =  7.5  where  the  symbol 
8  is  used  to  denote  determinations  (rather  than  77  to  denote  samples). 
The  estimated  variance  of  a  treatment  mean  is 


10(2) 


5  -f-  2(7.5) 
20 


=  1. 


The  question  is  then  asked,  "Is  the  present  design  more  or  less  efficient 
than  a  similar  design  employing  6  pints  per  storage  time  and  3  determi 
nations  per  pint?"  Calculating 


5-1-3(7.5) 
6(3) 


1.53, 


the  answer  is,  "The  present  design  is  more  efficient  than  the  proposed 
design."  In  fact,  the  efficiency  of  the  present  design  relative  to  the 
proposed  design  is:  R.E.  of  old  to  new  =  100(1.53/1)  =  153  per  cent. 

TABLE    11.19-Abbreviated  ANOVA  of  Ascorbic  Acid 
Content  of  Frozen  Strawberries 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean  Square 

Among  storage  times  

4 

4OO 

100 

<r*  +  2^-*-2°y-     • 

Among  pints  treated  alike  .  . 
Between  determinations  on 
pints  treated  alike.     . 

45 
5O 

9OO 
250 

20 

s                        4  £j    *' 

°* 

F-RATIOS     THAT 


11.6      SOME     REMARKS     CONCERNING 
ARE    LESS  THAN    UNITY 

In  all  the  examples  considered  so  far,  tlie  calculated  F-  values  have 
been  greater  than  unity.  Thus,  in  each  of  these  cases,  the  only  decision 
to  be  made  by  the  analyst  was  whether  the  calculated  value  should  be 
termed  statistically  significant  or  nonsignificant.  If  significant,  the 
hypothesis  H:<n  =  Q  (i=l,  -  •  -  ,  Q  or  H:o*  =  0  was  rejected;  if  not  sig 
nificant,  the  appropriate  hypothesis  was  not  rejected  (perhaps  even 
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accepted).  However,  it  is  possible  (and  quite  probable)  that  a  calcu 
lated  F-value  will  turn  out  to  be  less  than  unity.  What  should  our  con 
clusion  be  in  such  a  situation? 

We  can,  of  course,  simply  say  that  F  was  not  significant  and  thus  the 
hypothesis  cannot  be  rejected.  However,  such  an  easy  dismissal  of  the 
question  is  not  wise,  for  it  could  cause  us  to  ignore  a  valuable  warning 
sign.  Suppose,  as  might  happen,  that  F,  with  v\  and  f2  degrees  of 
freedom,  is  so  small  that  Fr  =  l/^P,  with  z>2  and  v±  degrees  of  freedom,  is 
significant.  What  should  our  conclusion  be  in  this  case?  It  appears  as 
though  something  should  be  rejected;  but  what  is  it?  In  this  situation, 
it  seems  reasonable  to  reject  the  postulated  statistical  model. 

If  the  statistical  model  is  rejected  because  of  a  significant  F'  value, 
what  are  the  steps  that  should  then  be  taken?  Some  of  these  are : 

(1)  The  experimental  procedure  should  be  reviewed  to  see  if  the 
various  assumptions  are  satisfied.  For  example,  if  the  proper 
randomization  was  not  employed,  the  validity  of  the  inde 
pendence  assumption  is  doubtful. 

(2)  If   sufficient   observations   are   available,   the   assumption   of 
normality  could  be  checked  by  plotting  the  data  either  on 
regular  graph  paper  or  on  normal  probability  paper. 

(3)  The  assumption  of  homogeneous  variances  might  be  checked, 
but  this  would  require  a  large  number  of  observations  within 
subclasses. 

(4)  The  underlying  phenomenon  should  be  restudied  to  see  if  the 
assumed  linear  model  is  a  good  approximation  to  the  true  state 
of  affairs.  If,  as  a  result,  the  assumed  model  is  rejected,  a 
search  should  be  made  for  a  new  model  which  better  describes 
the  observed  data  and  the  phenomenon  under  investigation. 

11.7   SATTERTHWAITE'S   APPROXIMATE   TEST   PRO- 
CEDURE 

When  discussing  the  analysis  of  a  completely  randomized  design 
involving  subsampling,  it  was  noted  that  no  exact  test  of 
J2r:ri  =  0(i=5=l,  •  *  •  ,  f)  was  possible  when  the  experiment  involved 
unequal  frequencies  at  the  various  stages  of  subsampling.  At  that  time, 
it  was  promised  that  an  approximate  test  procedure  would  be  explained 
later.  We  are  now  ready  to  fulfill  that  promise. 

The  proposed  approximation,  due  to  Satterthwaite  (29),  proceeds  as 
follows :  Using  estimates  of  the  components  of  variance,  mean  squares  will 
be  synthesized  which  will  have  the  same  expected  value  if  the  hypothesis 
to  be  tested  is  true.  These  synthetic  mean  squares  will  then  be  used  to  form 
a  ratio  which  is  approximately  distributed  as  F. 

How  are  the  synthetic  mean  squares  formed?  If  we  denote  the  actual 
mean  squares  existing  in  an  ANOVA  by  MSi,  MS%,  •  •  •  ,  MSk,  then  a 
synthetic  mean  square  may  be  obtained  by  forming  a  linear  combina 
tion  such  as 

L  =  aiMSi  +  a2MS2  +  -  -  -  4-  akMSk  (11.51) 
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where  the  at-  are  constants.  The  degrees  of  freedom  associated  with  L 
are  then  estimated  by 

.  .  .  * 

*/vh 

where,  of  course,  v±  represents  the  degrees  of  freedom  associated  with 
MSi(i—~L,  •  -  *  ,  fc).  Sometimes  both  the  numerator  and  denominator 
mean  squares  (in  the  approximate  /^-ratio)  will  be  synthesized.  How 
ever,  it  is  more  likely  that  only  one  synthetic  mean  square  will  be  used 
in  any  given  situation. 

Because  of  the  lack  of  uniqueness  of  the  approximate  .F-ratio  (dif 
ferent  ^-ratios  could  result  from  the  use  of  different  synthetic  mean 
squares)  and  because  of  the  necessity  of  approximating  the  degrees  of 
freedom,  the  procedure  is  of  limited  usefulness.  However,  if  used  with 
care,  it  can  be  of  value  to  the  researcher  and/or  statistician.  The  reader 
is  referred  to  Cochran  (10)  for  a  further  discussion  of  this  problem. 

Example  11.8 

Referring  to  Example  11.5,  we  recall  that  an  exact  test  of  H":ri  =  r2  =  0 
was  impossible.  This  was  so  because  c\  ?^C2  in  Table  11.15.  It  is  decided 
to  form  a  "synthetic  experimental  error  mean  square"  that  will  have  an 
expected  value  of  cr^  +  czo-*.  This  could  be  done  by  calculating 

-j—  #2«S> 

[i  - 


O  (0.1694)  +  [1  - 

The  approximate  jFVratio  would  then  be  F  =  24.0927/Z/  with  degrees  of 
freedom  v±  =  1  and  v<2.  =  v,  where 


[ai(0.1694)]V5  +  [a2(0.0111)]2/15 

The  details  of  the  numerical  calculations  are  left  as  an  exercise  for  the 
reader, 

11.8      SELECTED  TREATMENT  COMPARISONS:  GENERAL 
DISCUSSION 

In  Section  10.15,  the  idea  of  making  specific  comparisons  among 
treatment  means  was  introduced.  At  that  time,  also,  the  concept  of  an 
orthogonal  contrast  was  presented,  and  it  was  suggested  that  orthogo 
nal  contrasts  were  to  be  preferred  over  nonorthogonal  contrasts.  How 
ever,  the  researcher  was  warned  not  to  let  the  statistician's  desire  for 
orthogonality  override  his  (the  researcher's)  needs. 

In  this  section  some  general  comparisons  among  treatments  will  be 
examined,  not  to  illustrate  the  concept  of  a  contrast,  but  to  demon 
strate  the  manner  in  which  the  ANOVA  is  modified  to  provide  the 
proper  analysis.  Because  of  the  infinitely  many  possibilities,  this  will 
best  bejione  by  discussing  a  few  illustrative  cases. 
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For  example,  consider  an  experiment  involving  t  treatments  and  n 
experimental  units  per  treatment  in  which  no  subsampling  occurred. 
If  treatment  No.  1  were  a  "control"  treatment,  it  would  be  of  interest 
to  make  tlie  following  specific  comparisons  among  the  treatments: 
(1)  treatment  No.  1  versus  the  rest  and  (2)  among  the  rest.  The  sums  of 
squares  for  these  two  comparisons  would  be  determined  as  follows: 


-   T*/tn 


SS(l  versus  rest) 

\Tlfn  +  (r2  +  -  - 
xSVSXamong  the  rest) 


These  results,  when  coupled  with  the  basic  ANOVA,  "would  be  pre 
sented  as  in  Table  1 1 .20,  where  the  sums  of  squares  (degrees  of  freedom) 
for  the  selected  comparisons  are  offset  to  indicate  that  they  are  portions 
of  the  treatment  sum  of  squares  (degrees  of  freedom).  (NOTE:  In  this 
example,  the  sums  of  squares  for  the  two  comparisons  add  up  to  the 
treatment  sum  of  squares.  The  reader  is  "warned  that  this  will  not 
always  be  the  case.) 

TABLE   11.20-Generalized  ANOVA  Showing  Two  Selected  Treatment 

Comparisons 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

F-Ratio 

Mean  

1 

Jl4T,_., 

M 

Treatments  

t—  1 

T 

T 

T/E 

1  vs.    rest  

1 

(CV)™ 

Ci 

Ci/JB 

Among  the  rest.  .,.,,... 

t  —  2 

(  O2  lint 

C2 

Co/E 

Experimental  error  

t(n—l) 

Ew 

E 

Total 

in 

51  ^2 

A  second  illustration  based  on  the  same  type  of  design  would  be  the 
case  in  which  the  t  treatments  segregate  into  k  groups  containing 
t\,  t%,  -  •  *  ,  tk  treatments,  respectively,  where 


In  such  a  case,  the  natural  comparisons  would  be:  (1)  among  groups 
and  (2)  among  treatments  within  the  ith  group;  i=  1,  -  -  •,  fc.  The  sum 
of  squares  for  the  first  of  these  fc  +  1  comparisons  would  be  calctdated 
as  f  olio  ws  : 


Gyy  = 


groups)   = 


—  T*/tn. 
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The  sum  of  squares  among  treatments  in  the  first  group  is  given  by 


The  sums  of  squares  among  treatments  in  each  of  the  remaining  /b—  1 
groups  would  be  found  in  a  similar  manner.  The  results  would  then  be 
presented  in  ANOVA  form  as  in  Table  11.21.  (NOTE:  Once  again  the 

TABLE  11.21-Generalized  ANOVA  Showing  k+1 
Selected  Treatment  Comparisons 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

F-Ratio 

1 

-tkt  r/7/ 

AT 

^lean  
Treatments            

t—1 

-t  j/j/ 

r 

T/E 

Among  groups      

k  —  1 

Gyy 

G 

G/E 

'Within  group  1       .... 

h—l 

(WOw 

Wi 

W,/E 

^^ithin  group  2    

22—  1 

(TiT2)w 

W* 

Wz/E 

\jyithin  group  k       .... 

te—  1 

(TF*)w 

wk 

Wk/E 

j  f  —      i  > 

77 

E 

Experimental  error  

t(n—i) 

-&yv 

»~n_  •*-«  1 

•f/tsr 

y^  y2 

lotal 

tn 

^1^  ^ 

sums  of  squares  for  the  various  comparisons  add  up  to  the  treatment 

sum  of  squares.)  ,  .    . 

One  more  general  illustration  will  be  given.  In  this  instance,  assume 
(again)  that  one  treatment  is  a  "control."  However,  the  researcher 
wishes  to  do  more  than  compare:  (1)  control  versus  rest  and  (2)  among 
the  rest  He  also  wishes  to  compare,  separately,  each  noncontrol  treat 
ment  versus  the  control.  Thus,  in  addition  to  the  sums  of  squares  indi 
cated  in  the  first  illustration,  he  would  also  compute: 

(C,)w  =  -55(1  vs  2)  =  (Tl 
s  3)  =  (r* 


vs  0  =  (Tl  +  Tt}/n  -  (Tx  +  Tt)*/2n. 
These  results  would  then  be  presented  as  in  Table  11.22.  (NOTE  :  This 
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TABLE   11.22-Generalized  ANOVA  Showing  t+1 
Selected  Treatment  Comparisons 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

F-Ratio 

Mean  

1 

Mm/ 

M 

Treatments  

t—1 

T 

T 

T/E 

Control  vs.  rest    .     .  . 

1 

(C^yy 

Ci 

Ci/jE 

Among  rest      

t  —  2 

\\-si&)yy 

C2 

C2/£ 

Control  vs.  2  

1 

\\^%)yy 

C3 

CS/JE 

Control  vs.  3  

1 

(CO,*, 

C4 

c4/£ 

Control  vs    t   . 

1 

\\-"t    I  .1/7/rr 

v£  1  1 

Cn-i/E 

T^XTDCr  intent"  ?1  prror 

t(n^\\ 

Total 

tn 

T!  F2 

time  neither  the  degrees  of  freedom  nor  the  sums  of  squares  for  the 
comparisons  will  add  up  to  the  treatment  sum  of  squares.) 

Example  11  .9 

The  experiment  described  in  Example  10.10  was  performed  and  we 
wish  to  investigate  the  specified  comparisons.  Assuming  that  the  data 
given  in  Table  7.20  were  the  results  of  this  experiment,  it  iib  seen  that: 

(C^Vy  =  [(184  +  68)2/6  +  (170  -h  378)  V14]  -  (800)  2/20 

=  34.3 

(C^yy  -  [(184)V4  +  (68)  V2]  -  (252)  V6  -  192.0 
(C£yy  =  [(170)2/5  +  (378)2/9]  -  (548)2/14  =  205.7. 

Combining  these  figures  with  those  of  Table  7.21,  we  get  Table  11.23. 
Examination  of  the  F-  values  in  Table  11.23  indicates  that  all  the  treat 
ments  (electrolytes)  differ  significantly  in  their  effects  on  the  charac 
teristic  (of  the  batteries)  being  studied.  (NOTE:  See  Problein  11.30  for 
the  expected  mean  squares.) 

11.9      SELECTED  TREATMENT  COMPARISONS:  ORTHOG 
ONAL  AND   NONORTHOGONAL  CONTRASTS 

Having  spent  considerable  time  discussing  treatment  comparisons  in 
general,  let  us  now  concentrate  on  the  subject  of  contrasts,  and  par 
ticularly  on  orthogonal  contrasts. 

It  may  be  verified  that  the  sum  of  squares  associated  with  a  particu 
lar  contrast  is  given  by 


=  c 

^j 


1=1 


-( 


(11.53) 
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where  all  symbols  except  t  are  defined  as  in  Section  10.15.  The  symbol 
t  is  used  here,  rather  than  k  as  in  Section  10.15,  to  conform  to  the  nota 
tion  being  used  in  the  present  chapter.  If  each  treatment  total 

TABLE   11.23-ANOVA  for  Experiment  of  Example  11.9 

(Data  in  Table  7.20)  Showing  the  Analysis  of  a 

Specified  Set  of  Treatment  Comparisons 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

F-Ratio 

IVIean 

1 

32,000 

32,000 

Treatments    (electro 
lytes)      

3 

432 

144 

72 

Ci   (1    and   2   vs.   3 
and  4)  

1 

34.3 

34.3 

17.15 

C2  (1  vs.  2) 

1 

192.0 

192.0 

96 

C3  (3  vs.  4)        

1 

205.7 

205.7 

102.85 

1  ft 

32 

2 

Experimental  error  .  .  . 

nrv-vfoi 

20 

32  464 

is  the  sum  of  the  same  number  of  observations  (that  is,  if 
i=l,   -  -  -  ,  0>  Equation  (11.53)  simplifies  to 


n 


=  n  for 


(11.54) 


n 


The  results  would  then  be  presented  in  an  AN  OVA  in  agreement  with 
the  format  adopted  in  the  preceding  section.  [NOTE:  If  a  set  pf^— 1 
orthogonal  contrasts  among  t  treatments  is  investigated,  the  individual 
sums  of  squares  (one  for  each  contrast)  will  add  up  to  the  treatment 
sum  of  squares.  ] 

Example  11.10 

Consider  again  the  experiment  described  in  Example  10.10  and 
analyzed  in  Example  11.9.  The  sums  of  squares  associated  with  the 
three  contrasts  could  also  have  been  calculated  as  follows: 

[(7)  (184)  +  (7)  (68)  +  (-3)  (170)  +  (-3)(378)]' 
[4(7)2  +  2(7)2  +  5(-3)2  +  9(-3)»] 

[(1)(184)  +  (-2)  (68)  +  (0X170)  +  (0)(378)]2 
+  2(-2)»  +  5(0)2  +  9(0)2] 
(0)(68)  +  (9)  (170)  +  (-5)(378)]2 


(CO* 


(COw  j-4(0)2  +  2(Q)2  +  5(9)2  +  9(_5)*] 

The  ANOVA  will,  of  course,  be  the  same  as  in  Table  11.23. 
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Example  11.11 

The  experiment  described  in  Example  10.11  was  performed,  and^the 
data  shown  in  Table  11.24  were  recorded.  The  appropriate  calculations 


are: 


^F2  =  32,378 

Myv  =  (800)2/20  =  32,000 

Tyy  =  [(ISO)2  4-  (160)2  4-  (160)2  4-  (164)2  4-  (136)2]/4  -  32,000 

EW  =  32,378  —  32,000  —  248  =  130 

[(-1)(180)  +  (4)  (160)  +  (-1X160)  +  (-1X164)  +  (-1)(136)]2 


248 


(CO 


4[(-l)2  +  (4)2  +  (-1)2  +  (-1)2  +  (-1)2] 
(Q)(16Q)  +  (1)(160)  +  (-1X164)  +  (- 


4[(1)2  +  (O)2  +  (I)2  +  (-1)2  •+•  (-1)2] 
+  (0)(16Q)  +  (-1)(16Q)  +  (0)(164)  +  (0)(136)]2 


4[(1)24-  (0)24-  (~1)24-  (O)2 
i-  (0)(160)  4-  (0)(160)  4 


(O)2] 


=  100 


50 


98 


^    *)y"  4[(0)2  +  (O)2  +  (O)2  4-  (I)2  +  (~1)2] 

These  results  are  then  summarized  as  in  Table  11. 25.  Using  a:  =  0.05, 
all  contrasts  except  Ci  are  judged  to  be  statistically  significant.  (NOTE: 
See  Problem  11.31  for  the  expected  mean  squares.) 

TABLE   11.24r-Data  From  Experiment  Described  in  Example  10.11 
and  Discussed  in  Example  11.11 

Electrolytes 


1 

2 

3 

4 

5 

40 

38 

44 

41 

34 

45 

40 

42 

43 

35 

46 

38 

40 

40 

34 

49 

44 

34 

40 

33 

TABLE   11. 25- AN  OVA  for  Experiment  Described  in  Example  10.11 
(Data  in  Table  11,24;  Discussion  in  Example  11.11) 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

F-Ratio 

TV/T^ko-n 

1 
4 
1 
1 
1 
1 
15 

32,000 
248 
0 
100 
50 
98 
130 

32,000 
62 
0 
100 
50 
98 
8.67 

Elcr^T^lytes         .  ^    . 

7.15 
0 
11.53 
5.77 
11.30 

Ci                                

Co                     

Cs                

CA                                     

L*  •>  f\<a.T*1  TY1  d^T"!  "f~  £t  1     f^TTVIT 

Total 

20 

32,378 
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Up  to  this  point,  ttie  discussion  of  contrasts  has  centered  on:  (1) 
ANOVA  techniques  for  isolating  the  sums  of  squares  associated  with 
each  contrast  and  (2)  the  use  of  the  corresponding  mean  squares  to  test 
the  hypothesis  that  the  true  effects  estimated  by  the  contrasts  are  0. 
However,  the  problem  of  estimation  should  not  be  overlooked. 

If  the  true  effect  estimated  by  a  contrast  C3-  is  denoted  by  the 
symbol  <f>j,  it  is  desirable  to  construct  a  confidence  interval  estimate  of 
0y.  That  is,  two  numbers,  L  and  U,  are  sought  such  that  we  can  be 
100-y  per  cent  confident  that  <f>  will  be  between  L  and  U.  To  determine 
Lf  and  U,  the  standard  error  of  a  contrast  is  needed.  Defining  the  esti 
mated  variance  of  a  contrast  by 

=  v (  2:  CV 


the  standard  error^of  a  contrast  is  given  by  VF(C/). 

The  nature  of  V(Tf)  will,  of  course,  depend  on  whatever  assumptions 
are  made  concerning  the  observations.  If  we  are  dealing  with  a  com 
pletely  randomized  design  involving  t  treatments  and  n  experimental 
units  per  treatment  in  which  no  subsampling  has  been  performed  and 
if  the  usual  assumptions  (see  Section  11.2)  have  been  made,  then 


;?y  (11.56) 

and 

r\  

l    s~~t    ~p-    ,  \'\S "&(("*  \  ^1  1     ^7^ 

U) 

where  v  stands  for  the  number  of  degrees  of  freedom  associated  with  s2 
in  Equation  (11.56). 

Example  11.12 

Consider  the  experiment  discussed  in  Examples  10.11  and  11.11.  The 
data  were  presented  in  Table  11.24  and  the  ANOVA  in  Table  11.25. 
For  this  case,  we  have: 

F(Ci)  =  4(8.67)[(-l)2  +  (4)2  -h  (-1)2  +  (-1)2  +  (-1)2] 
F(C2)  =  4(8.67)  [(I)2  +  (O)2  +  (I)2  +  (-1)2  +  C-l)2] 
F(C3)  =  4(8.67)[(1)2  +  (O)2  +  (-1)2  +  (O)2  +  (O)2] 
F(C4)  =  4(8.67)[(0)2  +  (O)2  +  (O)2  +  (I)2  +  (-1)']. 
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Since  s2==8.67  had  15  degrees  of  freedom,  confidence  intervals  for 
4>i(z  =  l,  2,  3,  4)  may  easily  be  constructed  using  Equations  (10.23) 
and  (11.57). 

11.10     ALL    POSSIBLE    COMPARISONS    AMONG    TREAT 
MENT  MEANS 

In  Sections  11.8  and  11.9,  the  usual  method  of  analyzing  comparisons 
among  treatment  means  was  discussed  in  considerable  detail.  How 
ever,  one  very  important  (statistical)  restriction  on  the  use  of  the  de 
scribed  method  was  not  mentioned.  This  restriction  is  as  folio  TVS:  The 
comparisons  to  be  studied  should  be  selected  in  advance  of  any  analysis  of 
the  data.  That  is,  the  method  of  analyzing  contrasts  described  in  the 
preceding  sections  would  not,  in  general,  be  valid  if  the  comparisons 
were  decided  upon  after  a  perusal  of  the  data  and  (perhaps)  a  pre 
liminary  ANOVA.  In  other  words,  the  comparisons  should  have  been 
decided  upon  during  the  planning  stage. 

The  restriction  stated  in  the  preceding  paragraph  can,  however,  work 
a  hardship  on  the  researcher.  Much  experimentation  is  of  a  purely 
exploratory  nature  and  little,  if  any,  idea  of  which  comparisons  might 
be  of  interest  is  available  prior  to  the  collection  and  analysis  of  the 
data.  In  such  cases,  the  researcher  would  like  to  gain  more  from  the 
analysis  than  a  simple  statement  that  the  treatment  means  are,  or  are 
not,  statistically  significant.  He  would  also  like  to  know,  for  example, 
if  some  of  the  treatments  might  be  considered  equivalent  and  which 
treatment  is  "best." 

How  can  the  researcher  attain  the  goals  stated  in  the  preceding 
paragraph?  This  problem  has  received  much  attention  from  statisti 
cians  in  recent  years,  and  some  of  those  who  have  made  contributions 
in  the  area  are:  Bechhofer  (4),  Duncan  (15,  16),  Dunnett  (17),  Hartley 
(22),  Keuls  (24),  Kramer  (25,  26),  Newman  (27),  Scheffe  (30),  and 
Tukey  (33,  34,  35,  36).  Incidentally,  the  methods  of  Duncan,  Scheff6, 
and  Tukey  (the  major  protagonists)  are  discussed  in  detail  in  Federer 
(19),  and  numerical  illustrations  are  given  for  each  method. 

Before  proceeding  to  discuss  the  method  which  I  favor,  time  will  be 
taken  to  mention  an  associated  technique  which  has  been  widely  used 
by  researchers  for  many  years.  This  technique  involves  what  is  known 
as  a  least  significant  difference  or  LSD,  which  is  defined  by 


LSD  = 


where  v  represents^the  degrees  of  freedom  associated  with  the  variance 
estimate  used  in  F(Fi—  Fy).  The  LSD  technique  operates  as  follows: 
If  the  absolute  value  of  the  difference  between  any  two  treatment 
means  exceeds  the  LSD,  the  effects  of  the  two  treatments  are  judged 
to  be  significantly  different  ;  if  the  absolute  value  of  the  difference  does 
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not  exceed  the  LSD,  no  such  conclusion  is  reached.  The  reader  is 
warned  that  indiscriminate  use  of  the  LSD  technique  is  dangerous  for, 
if  we  have  enough  treatments,  the  probability  is  high  that  at  least  one 
of  the  t(t— 1)/2  differences  will,  due  to  chance  alone,  be  judged  sig 
nificantly  different.  Thus,  the  use  of  the  LSD  is  to  be  discouraged. 
(NOTE:  When  Z  =  2,  the  LSD  is  a  legitimate,  but  redundant,  device.) 

The  method  to  be  used  in  this  book  for  making  (when  desirable)  all 
possible  comparisons  among  treatment  means  is  that  proposed  by 
Scheff6  (30).  While  this  method  has  not  been  the  one  most  widely 
adopted,  it  does  have  certain  advantages.  These  advantages  are: 
(1)  it  is  closely  related  to  the  concept  of  a  contrast,  (2)  it  uses  tables 
that  are  widely  available  (viz.,  .F-tables),  and  (3)  it  is  easy  to  use.  Let 
us  now  see  how  the  technique  works. 

Recalling  that  a  contrast  is  defined  by 


(11.59) 
the  procedure  is  to  calculate 

'2  (11.60) 


where 

A*  =  (t  -  l^ci^oc,,.,,),  (11.61) 


^(Ti),  (11.62) 

vi  =  t  —  1,  (11.63) 

and  ?2  stands  for  the  degrees  of  freedom  associated  with  the  denomi 
nator  mean  square  used  in  the  7^-test  of  H\T±~T<L—  •  •  *  =T*.  Then,  if 
\Cj\  >A[F(C/)]1/2,  the  hypothesis  H:<pj  =  0  will  be  rejected.  (See 
Section  11.9  for  the  definition  of  <£/.)  That  is,  if  the  absolute  value  of 
Cj  exceeds  A[F(C/)]1/2,  the  contrast  <7/  will  be  said  to  differ  signifi 
cantly  from  0.  [NOTE:  The  original  F-test  rejects  -ff:r;  =  0 
(i=  1,  -  -  -  ,  f)  if  and  only  if  at  least  one  <7/  is  significantly  different,  by 
Scheff^'s  techjiique,  from  0.  The  application  of  Scheffe's  procedure  per 
mits  us,  then,  to  determine  which  of  the  C/  are  significant.  ] 

Example  11.13 

Consider  the  experiment  described  in  Examples  10.10  and  11.9.  The 
data  were  presented  in  Table  7.20  and  analyses  in  Tables  7.21  and  11.23. 
The  value  of  A  to  be  used  in  making  any  desired  comparison  is  found 
to  be  3.98  since 

A*  =  0  -  l)/^!-*)^.^)  -  (3)F.99ca.i6)  =  (3) (5.29)  -  15.87. 
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Let  us  examine  contrast  C%  described  in  Example  10.10,  namely, 

c2  =  (I)T!  +  (-2)ra  +  (0)T3  +  (O)r4  =  (i)(i84)  +  (-2)  (as)  -  48. 

The  estimated  variance  of  C%  is  given  by 

•p-(C2)  =  (l)2(4s2)  +  (-2)*(2*2)  =  12^2=  12(2)  =24. 
Therefore, 

2  =  (3.98)  V24  =  (3.98)  (4,899)  «  19.498. 


Since  j  Cy|  =  48  >  19.498,  we  conclude  that  the  difference  between  the 
effects  of  treatment  No.  1  and  treatment  No.  2  is  statistically  signifi 
cant.  Incidentally,  this  agrees  with  the  conclusion  reached  in  Example 
11.9.  Other  comparisons  among  the  treatment  effects  could  be  made  in 
a  like  manner. 

Example  11.14 

Consider  the  experiment  described  in  Examples  10.11  and  11.11.  The 
data  were  presented  in.  Table  11.24  and  the  analysis  in  Table  11.25.  In 
this  illustration,  A2  =  4(4.89)  =  19.56  and  thus  A  =4.42.  If  we  are  inter 
ested  in  Cz  as  defined  in  Table  10.3,  it  may  be  verified  that  (72  =  40, 
F(C2)  =  16s2  -16(8.67)  =138.72,  [F(Cy  ]1/2  =  1  1.78,  and  -A[F(C2)]1/2 
=  52.07.  Since  |  C*\  =40  <A  [F(C2)  ]1/2  =  52.07,  we  conclude  that  C2  is 
not  significantly  different  from  0. 

It  is  noted,  however,  that  this  conclusion  is  the  opposite  of  that 
reached  in  Example  11.11.  Why  is  this?  The  reason  may  be  explained 
as  follows:  Scheff^'s  method  will  not  lead  to  significant  results  (if  the 
appropriate  null  hypothesis  is  true)  as  frequently  as  will  the  classical 
approach  of  orthogonal  comparisons  because  we  have  been  permitted 
to  examine  the  data  before  deciding  on  the  analysis.  This,  obviously, 
should  lead  to  fewer  cases  of  claiming  significance  when  no  real  dif 
ferences  exist.  This  is  as  it  should  be,  for,  if  we  can  look  at  the  data 
before  deciding  on  the  comparisons  to  be  investigated,  we  should  be 
able  to  lessen  our  chances  of  making  errors.  From  the  point  of  view  of 
estimation,  this  decrease  in  the  "frequency  of  errors"  takes  the  form  of 
longer  confidence  intervals  (i.e.,  our  estimates  are  less  precise)  than 
those  provided  by  the  classical  approach. 

11.11  RESPONSE  CURVES:  A  REGRESSION  ANALYSIS  OF 
TREATMENT  MEANS  WHEN  THE  VARIOUS  TREAT 
MENTS  ARE  DIFFERENT  LEVELS  OF  ONE  QUANTI 
TATIVE  FACTOR 

The  reader  may  be  wondering  why  the  subject  of  this  section  is 
under  discussion  at  this  time.  Did  we  not  discuss  regression  analyses 
completely  enough  in  Chapter  8?  Of  course  we  did,  but  now  we  wish 
to  utilize  the  techniques  of  regression  to  make  more  complete  and  in 
formative  analyses  of  data  arising  from  completely  randomized  designs 
in  which  the  treatments  are  different  levels  of  a  single  quantitative 
factor. 
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How  is  this  possible?  Let  us  suppose  th^at  the  treatments  being 
examined  are:  (1)  different  levels  (or  rates)  of  application  of  the  same 
fertilizer,  (2)  different  weights  of  an  object  being  moved  in  a  time-and- 
motion  study  project,  or  (3)  different  intensities  of  a  given  stimulus  in 
a  psychological  experiment.  If  situations  such  as  these  arise,  it  seems 
reasonable  to  investigate  how  the  measured  characteristic  varies  with 
changes  in  the  level  of  the  treatment.  That  is,  we  would  like  to  know 
if  the  change  in  the  measured  characteristic  takes  place  in  a  linear, 
quadratic,  ,  .  .  fashion  as  the  level  of  the  treatment  is  increased  or 
decreased.  In  other  words,  we  wish  to  gain  some  idea  of  the  shape  of 
the  response  curve  so  that  an  estimate  may  be  made  of  the  optimum 
level  of  the  treatment. 

Just  how  will  the  type  of  analysis  indicated  above  be  carried  out? 
The  first  step  is  to  plot  the  treatment  means,  thus  gaining  some  idea 
as  to  the  general  shape  of  the  response  curve.  Once  this  has  been  done, 
the  researcher  will  be  ready  to  undertake  a  more  rigorous  analysis  of 
his  sample  data. 

Equations  for  various  possible  response  curves  could,  of  course,  be 
determined  using  the  techniques  of  Chapter  8.  However,  the  deter 
mination  of  the  equation  of  the  response  curve  is  not  the  immediate 
aim  of  our  analysis.  The  immediate  aim  is  to  reach  an  objective  de 
cision  (based  on  more  than  a  simple  plotting  of  the  means)  as  to  the 
nature  of  the  regression  function  that  will  best  describe  the  effect  of 
the  treatment  on  the  response  variable. 

Perhaps  the  most  convenient  way  of  reaching  the  goal  stated  in  the 
preceding  paragraph  is  to  determine  how  much  of  the  treatment  sum 
of  squares  would  be  associated  with  each  of  the  terms  (linear,  quad 
ratic,  .  .  .  )  in  a  polynomial  regression.  If  the  various  levels  of  the 
treatment  being  studied  are  equally  spaced,  this  analysis  can  best  be 
carried  out  using  the  method  of  orthogonal  polynomials  introduced  in 
Section  8.20.  (NOTE:  The  assumption  of  equal  spacing  will,  in  general, 
present  no  problem,  for  both  the  researcher  and  the  statistician  will 
ordinarily  plan  the  experiment  in  such  a  way  as  to  insure  that  the 
assumption  will  be  satisfied.  That  is,  in  most  applications  equal  spacing 
is  the  usual  state  of  affairs.) 

If  each  treatment  total  (T*)  is  the  sum  of  n  observations,  the  desired 
sums  of  squares  are  found  using 


(t 
g^ 

due  to  the  kth  degree  term  =  -  ;  (11  .  64) 


n 
,-••,£  —  1; 


where  the  £**  are  orthogonal  polynomial  coefficients.  Extensive  tables 
of    orthogonal    polynomial    coefficients    are    given    in    Anderson    and 
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TABLE   11.26-Partial  Table  of  Orthogonal  Polynomial  Coefficients 


*-2 

t  =  3 

*-4 

t  =  5 

i 

k=l 

k=l  k  =  2 

*  =  1  ^  =  2 

k  =  3 

k=l  k  =  2  k  =  3 

£  =  4 

1  

—  1 

—  1   +1 

—  3   +1 

_1 

—  2   +2   —1 

+  1 

2, 

0   —2 

•£   -^ 

+3 

—1   —  1   +2 

—  4 

3  

+  1   +1 

+  1   —1 

—  3 

0   —2    0 

+  6 

4 

+3   +1 

+  1 

+1   —1   —2 

—  4 

5  

_|_2   +2   +1 

Houseman  (1) ;  for  your  convenience  an  abbreviated  tabulation  is 
provided  in  Table  11.26.  In  agreement  with  the  notation  previously 
adopted,  the  sums  of  squares  associated  with  the  linear,  quadratic, 
cubic,  .  ,  .  terms  will  be  denoted  by  (T  £)yy,  (TQ}yy,  (Tc)vy,  -  -  -  .  In 
addition,  since  it  is  unlikely  that  the  researcher  will  wish  to  isolate 
more  than  a  few  terms  when  studying  the  treatment  sum  of  squares, 
the  balance  (if  any)  will  be  represented  by  (T^^yy.  For  example,  if  the 
linear,  quadratic,  and  cubic  effects  were  isolated,  the  sum  of  the  squares 
of  the  deviations  from  regression  would  be  given  by 


—   Tyy 


(11.65) 


The  results  of  the  foregoing  calculations  may  then  be  summarized  as  in 
Table  11.27. 

Example  11 .15 

Consider  the  data  in  Table  11.28.  Although  an  examination  of  the 
treatment  totals  suggests  that  a  linear  response  function  may  be  ap 
propriate,  the  quadratic  effect  will  also  be  isolated  for  illustrative  pur- 

TABLE   11.27-Generalized  ANOVA  For  a  Completely  Randomized 

Design  Showing  the  Isolation  of  the  Linear,  Quadratic,  and  Cubic 

Components  of  the  Treatment  Sum  of  Squares 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

F-Ratio 

Mean  

1 

MW 

M 

Treatments     

t  —  1 

T 

T 

T/E 

TL  

1 

(  T  T  Iini 

TL 

TL/E 

To  

1 

(To}w 

TQ 

To/E 

Tc    

1 

(Tc)™ 

Tc 

Tc/E 

jTx>etv  

t  —  4 

Tritr-n 

Tr>ev/E 

Experimental  error.  . 

t(n—  1) 

J—JtrtJ 

E 

Total 

tn 

T:  Y* 
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TABLE   11. 28- Yields  (Converted  to  Bushels/Acre)  of  a  Certain  Grain 

Crop  in  a  Fertilizer  Trial 


Level  of  Fertilizer 

No 

Treatment 

10  Ibs. 
per  Plot 

20  Ibs. 
per  Plot 

30  Ibs. 
per  Plot 

40  Ibs. 
per  Plot 

20 
25 
23 
27 
19 

25 
29 
31 
30 
27 

36 
37 
29 
40 
33 

35 
39 
31 
42 
44 

43 
40 
36 

48 

47 

Totals 
Means 

114 
22.8 

142 
28.4 

175 
35 

191 
38.2 

214 

42.8 

poses.  The  following  sums  of  squares  were  obtained: 
52  F2  =  29,560 
Myy  -  (836)  »/25  =  27,955.84 
TW  =  [(H4)2  +  (142)2  +  (175)2  +  (191)2  +  (214)2]/5  -  27,955.84 

=  1256.56 
Eyy  -  29,560  -  27,955.84  -  1256.56  «  347.60 

h  (0)(175)  +  (1)(191)  +  (2)(214)]2 


5[(-2)«  +  (-1)2  +  (O)2  +  (I)2  +  (2)2] 
(249) 2 


50 


1240.02 


[(2)  (114)  +  (-1X142)  +  (-2)(175)  +  (-1X191)  +  (2)(214)]2 


(-27)2 
70 


5[(2) 


10.41 


(-2) 


(2)2] 


w  =  1256.56  -  1240.02  -  10.41  =  6.13. 
These  are  summarized  in  Table   11.29.  Examination  of  the  F-ratios 

TABLE  11.29-ANOVA  for  Data  of  Table  11.28 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 

Squares 

Mean 
Square 

.F-Ratio 

!MLean                      .  .  . 

1 

27,955.84 

27,955.84 

Fertilizer  levels  .... 
TL    

4 
1 

1,256.56 

1,240.02 

314.14 
1,240.02 

18.07 
71.35 

TQ  

1 

10.41 

10.41 

0.60 

TD  e-o  

2 

'  6.13 

3.07 

0.18 

Experimental  error 

20 

347.60 

17.38 

Total 

25 

29,560.00 
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confirms  our  subjective  judgment  that  the  response  of  yield  to  rate  of 
application  of  the  fertilizer  is  linear  within  the  range  of  the  levels  of 
fertilizer  applied.  This  suggests  that  the  rate  of  application  of  the 
fertilizer  might  be  increased  even  more,  with  an  accompanying  increase 
in  the  yield.  However,  the  reader  is  warned  that  extrapolation  of  the 
linear  relationship  much  beyond  40  Ibs./plot  could  (possibly)  lead  to 
erroneous  conclusions.  Another  way  of  putting  this  is  to  say  that  the 
optimum  level  of  fertilizer  application  has  probably  not  yet  been 
reached,  and  further  experimentation  should  be  carried  out  along  these 
lines. 

11.12  ANALYSIS  OF  A  COMPLETELY  RANDOMIZED  DE 
SIGN  INVOLVING  FACTORIAL  TREATMENT  COM 
BINATIONS 

By  now  the  reader  should  be  gaining  some  facility  in  the  calculation 
of  sums  of  squares  associated  with  various  sources  of  variation.  Thus, 
the  advent  of  another  special  situation,  namely,  factorial  treatment 
combinations,  should  present  no  new  problems.  In  fact,  once  the  reader 
realizes  that  the  factorial  analysis  is  simply  another  way  of  partitioning 
the  treatment  sum  of  squares,  he  is  well  on  the  way  to  a  solution. 
Let  us  now  examine  the  details. 

It  has  previously  been  noted  that  the  usual  statistical  model  asso 
ciated  with  a  completely  randomized  design  involving  t  treatments  and 
n  experimental  units  per  treatment  is 

Ya  =  M  +  T*  +  e,y;  i  =   1,  -  •   -  ,  t  (11.66) 

j  =  ly  .  .   .  y  n. 

If  we  are  now  informed  that  the  t  treatments  are  actually  all  combina 
tions  of  a  levels  of  factor  a  and  b  levels  of  factor  &  (that  is,  t  =  a&) ,  the 
statistical  model  may  be  rewritten  as 

Ftf*  =  fji  +  oLi  +  fr  +  (<*#)#  +  €#*;          i  =   1,  •  •  •  ,  a  (11 .  67) 

J  =   1,  •  •  '  ,  ft 
k  =   1,  •  -  -  ,  n 

where j^is^  the  true  mean  effect,  ca  is  the  true  effect  of  the  ith  leveLof 
factor  a,  £/  is  the  true  effect  of  the  jth  level  of  factor  &,  (<*£)*/  is  the 
true  effect  "of  the  interaction  of  the  ith  level  of  factor  a  with  the  jth 
level  of  factor  6,  and  e-^k  is  the  true  effect  of  the  &th  experimental  unit 
subjected  to  the  (i?)th  treatment  combination.  As  usual,  it  is  assumed 
that  M  is  a  constant  and  that  the  e*/*  are  NID  (0?  a-} .  Rather  than  discuss 
assumptions  concerning  oa}  /3j,  and  (cqS)*/  at  this  time,  our  attention 
will  be  directed  towards  the  calculation  of  the  various  sums  of  squares. 
When  tlie  method  of  calculation  has  been  explained,  we  shall  return 
to  the  assumptions  and,  as  a  consequence,  to  the  expected  mean 
squares,  estimation  and  test  procedures,  and  other  related  topics. 

A  moment's  reflection  will  confirm  that  the  basic  calculations  are 
unchanged.  That  is,  ^,Y2,  Myy,  Tvy,  and  Eyy  will  all  be  calculated  as 
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before.  However,  if  we  adopt  the  following  notation: 

Ai  =   total  of  all  observations  associated  with  the  ith  level 

of  factor  a 
b      n  (11.68) 

=  z;  z;  Y«» 

y=i  £=1 
BJ  =  total  of  all  observations  associated  with  the  jth  level 

of  factor  6 

(11.69) 

-  ib  i:  Y«*> 

1=1  A^I 

and 

TV  =  total  of  all  observations  associated  with  both  the  ith 

level  of  factor  a  and  thejth  level  of  factor  b 
=  entry  in  the  (i/)th  cell  of  the  a  X  6  table  (11.70) 

=  i: 


it  may  be  shown  that 

AyV  =  sum  of  squares  associated  with  the  different  levels  of  a 

(F<  -    F)2 

(11.71) 


=  sum  of  squares  associated  with  the  different  levels  of  b 
=  an 


(11.72) 


and 

•S^ab  =  among  subclasses  (cells)  sum  of  squares  for  the  #X6  table1 


(U.73) 

a          b 


1  The  reader  \vill  recognize  that,  in  this  particular  situation,  Sab  =  Tyy.  How 
ever,  the  new  notation  and  terminology  were  introduced  at  this  time  to  acquaint 
the  reader  with  a  system  (of  notation,  terminology,  and  calculation)  that  will 
prove  most  valuable  when  factorials  involving  more  than  two  factors  are  analyzed. 
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Using  the  preceding  results,  it  may  be  verified  that 

=  sum  of  squares  associated  with  the  interaction  of 
factors  a  and  b 


=  n  :  ; 

*=i  y—  i 


,,  -  F,  -  Y,  + 


(11.74) 


These  results  are  summarized  in  ANOVA  form  in  Table  11.30. 

TABLE  11.30-ANOVA  for  a  Two-Factor  Factorial  in  a  Completely 

Randomized  Design 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 

Square 

Mean  

1 

MW 

M 

Treatments 
A  

a—I 

•&-int 

A 

B  

b  —  1 

Bin, 

B 

AB   

(a—  1)(6—  1) 

(A&)yy 

AB 

Experimental  error 

ab(n  —  1) 

JCLim 

E 

Total 

abn 

T\  F2 

Having  explained  the  calculation  of  the  various  sums  of  squares, 
your  attention  Is  now  directed  to  the  assumptions  associated  with  the 
or*,  £j,  and  (a/8)v-.  There  are  four  possible  sets  of  assumptions  that  can 
be  made  with  respect  to  the  true  treatment  effects.  These  are  discussed 
below. 

Model  I:  Analysis  of  Variance  (Fixed  Effects)  Model 

This  model  is  assumed  when  the  researcher  is  concerned  only  with  the 
a  levels  of  factor  a  and  the  b  levels  of  factor  b  present  in  the  experiment. 
Mathematically,  these  assumptions  are  summarized  by: 


="  0. 


y— i 


Model  II:  Component  of  Variance  (Random  Effects)  Model 

This  model  is  assumed  when  the  researcher  is  concerned  with:  (1)  a 
population  of  levels  of  factor  a  of  which  only  a  random  sample  (the  a 
levels)  are  present  in  the  experiment  and  (2)  a  population  of  levels  of 
factor  b  of  which  only  a  random  sample  (the  6  levels)  are  present  in 
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the  experiment.  Mathematically,  these  assumptions  are  summarized 
as  follows: 

at  are  NID  (0?  <rj 

ft-  are  NID  (0,  <rj 
(aftis  are  NID  (0, 


Model  III:  Mixed  Model  (a  Fixed,  b  Random) 

This  model  is  assumed  when  the  researcher  is  concerned  with:  (1) 
only  the  a  levels  of  factor  a  present  in  the  experiment  and  (2)  a  popu 
lation  of  levels  of  factor  6  of  which  only  a  random  sample  (the  6  levels) 
are  present  in  the  experiment.  Mathematically,  these  assumptions  are 
summarized  as  follows  : 

i>;=   2b(«/3)*  =  0 

twi  t-=i 

ft-  are  NID  (0,  ^). 
Please  note  that   231=1  («£)#  was  not  assumed  to  be  0. 

Model  III:  Mixed  Model  (a  Random,  b  Fixed) 

This  model  is  assumed  when  the  researcher  is  concerned  with:  (1)  a 
population  of  levels  of  factor  a  of  which  only  a  random  sample  (the  a 
levels)  are  present  in  the  experiment  and  (2)  only  the  &  levels  of 
factor  &  present  in  the  experiment.  Mathematically,  these  assumptions 
are  summarized  as  follows: 

oa  are  NID  (07  <r«) 

=  0. 


Please  note  that    X)?-!  G*£)v  was  not  assumed  to  be  0. 

While  the  logic  underlying  the  preceding  mathematical  formulations 
is  beyond  the  scope  of  this  text,  it  is  hoped  that  the  validity  of  the 
expressions  will  be  substantiated  by  the  arguments  which  will  accom 
pany  the  specification  of  the  several  F-tests.  Thus,  it  is  requested  that 
the  reader  accept  the  expressions  in  good  faith  and  concentrate  on 
learning  the  methods  of  analysis.  In  the  long  run,  this  will  prove  most 
beneficial. 

Based  on  the  foregoing  assumptions,  the  expected  mean  squares  may 
now  be  derived.  As  in  the  preceding  examples,  the  derivations  will  be 
omitted  and  only  the  results  tabulated.  The  expected  mean  squares 
for  each  of  the  four  cases  are  shown  in  Table  11.31. 

Examination  of  the  expected  mean  squares  in  Table  11.31  will  indi 
cate  the  proper  F-tests  for  such  hypotheses  as  HI  :  a.*  =  0  (i  =  1  ,  -  •  •  ,  a)  , 
#2:/3y  =  0  (j  =  l,  -  •  -  ,  &),  Jy3:(a/S)iy  =  0  (i=l,  -  -  -  ,  a;  j  =  1,  -  -  •  ,  6), 


•, 

O 

U 


is 


8  g 


Ml 

' 


l 

w  j? 
i  ^ 


? 

i—  i 

» 

J» 

O 

a 

-W! 

1 

S 

»—  i 

M 

"|     "|     "| 

1 

M                        M                        «                 « 

b             b             b         b 

O 
TJ 

7 

-S 

«> 

03 

-W3 

8 

I 

tT 

•8 

N^    T   M| 

03 

"O 

1 
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03 
03 

w 

W 
1—  1 

"1  T 

'03 

•§ 

"1  "1  "I 
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b             b             b         b 

hH 

1 

iH 

^ 
I        I       "1 

«w.i  -wi  tS 
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TABLE   ll.Sa-.P-Ratios  for  Testing  the  Appropriate  Hypotheses  When  Dealing 

With  a  Two-Factor  Factorial  in  a  Completely  Randomized  Design  (See 

Table  11.30  for  the  ANOVA  and  Table  1 1.31  for  the  Expected 

Mean  Squares) 


Source  of  Variation 

F-Ratio 

Model  I 

Model  II 

Model  III 
(a  fixed,  b 
random) 

Model  III 
(a  random, 
b  fixed) 

Mean  

Treatments 
A  

A/E 
B/E 
AB/E 

A/AB 
B/AB 
AB/E 

A/AB 
B/E 
AB/E 

A/E 
B/AB 
AB/E 

B  .     .  .  .            ... 

AB 

Experimental  error.  .  . 

Total 

=0.  For  your  convenience  these  are 


H  4:0^  =  0,  H5:o%  =  Qy  and  H& 
specified  in  Table  11.32. 

Before  attempting  a  discussion  of  the  reasons  why  the  expected  mean 
squares  (and  thus  the  ^-tests)  are  as  indicated,  a  three-factor  factorial 
will  be  considered.  When  this  has  been  done,  a  general  discussion  of 
test  procedures  will  be  undertaken  and  numerical  examples  presented. 

When  a  three-factor  factorial  is  associated  with  a  completely 
randomized  design  involving  n  experimental  units  per  treatment  com 
bination,  the  appropriate  statistical  model  is 


i  =  1,  -  -  •  ,  a    (11.75) 

y  =  i,  •  •  -  ,  » 

k  =  1,  -  -  -  ,  c 
*=!,-••,» 

in  which  all  terms  are  defined  in  a  manner  analogous  to  ttie  definitions 
accompanying  Equation  (11.67).  The  basic  sums  of  squares,  namely, 
]F^F2,  Myy,  Tyy,  B^d  E  yy  &*&  calculated  in  the  usual  way.  Then,  if  one 
forms  an  aX&Xc  table,  an  aX&  table,  an  aXc  table,  and  a  &Xc  table, 
the  remaining  sums  of  squares  may  be  found  as  follows: 


among  cells  sum  of  squares  for  the 


table 


(11.76) 


t=i  y—  i  fc—  i 
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X  b  table 

(11.77) 
v  y 


=  among  cells  sum  of  squares  for  the  a  X  b  table 


Sac  —  among  cells  sum  of  squares  for  the  a  X  c  table 

\h    5T*         2     , 

=  2-.,  2^  Tik/bn  —  Myv,  (11.78) 

Sbc  =  among  cells  sum  of  squares  for  the  b  X  c  table 

T\/an-M  (11.79) 

ben  —  Mm,  (11.80) 

=  22  JB,-/acn  —  Myy,  (11.81) 

==  X)  Ck/abn  —  Myy,  (11.82) 

A=I 

=      Sab      —       Ayy     —      Byy,  (11.83) 

~=     ^ac      -       Ayy     ~      CVU,  (11.84) 

=     Sic     —      BVV     —      Cyy,  (11    .    gS) 

In  the  above  expressions,  the  various  totals  are  denned  as  shown 
below : 

T%jk  —  total  of  all  observations  associated  with  the  ith  level 
of  factor  a,  the  yth  level  of  factor  b,  and  the  kth  level 
of  factor  c 
==  entry  in  the  (ijtyth  cell  of  the  a  X  b  X  c  table  (11.  87) 


and 


total  of  all  observations  associated  with  the  ith  level 

of  factor  a  and  the  yth  level  of  factor  6 

entry  in  the  (ij)th  cell  of  the  a  X  b  table  (11.88) 


*  1    Z=l 
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total  of  all  observations  associated  with  the  iih  level 

of  factor  a  and  the  kth  level  of  factor  c 

entry  in  the  (ijfe)th  cell  of  the  a  X  c  table  (11.89) 

b          n  b 

/   /      s     j     *    ijkl    ===:       S  _^     -t    iyky 

y-1    Z=i  y_i 

total  of  all  observations  associated  with  the/th  level 

of  factor  b  and  the  kih  level  of  factor  c 

entry  in  the  (jfyih  cell  of  the  b  X  c  table  (11.90) 


i=l    Z=l  1=1 

total  of  all  observations  associated  with  the  ith  level 
of  factor  a 

«*  (ii  .91) 


total  of  all  observations  associated  with  jth  level  of 
factor  b 

</«  =  ib  s  rw  (n.92) 


and 

Ck  =  total  of  all  observations  associated  with  the  £th  level 
of  factor  c 

=  z  i:  i:  YM  = 


The  pertinent  sums  of  squares  are  summarized  in  ANOVA  form  in 
Table  11.33. 

As  was  the  case  with  a  two-factor  factorial,  the  assumptions  concern 
ing  the  true  treatment  effects  can  take  several  forms.  In  fact,,  for  a 
three-factor  factorial,  there  are  eight  different  situations.  Rather  than 
discuss  all  of  these,  only  four  representative  cases  will  be  exhibited. 
These  are  described  below. 
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TABLE   11.33-ANOVA  for  a  Three-Factor  Factorial  in  a  Completely 

Randomized  Design 


Source  of  Variation 

Degrees  of  Freedom 

Sum  of 
Squares 

Mean 
Square 

IVLean          

1 

-Ww 

M 

Treatments 
A    

a—  I 

Ayy 

A 

B               

b  —  1 

Byy 

B 

C    

c—\ 

r* 

^w 

C 

AB   

(a—  1)(6-1) 

(AB~)m 

A# 

AC            

(a—  l)(e-l) 

(AC)m 

AC 

BC            

(6—  l)(e—  1) 

(.BQyy 

BC 

ABC  

(o—  l)(6-l)(c-l) 

(ABQW 

ABC 

Experimental  error.  . 

abc(n—  1) 

Eyy 

E 

TVktcil 

/T  hfyt 

y  FZ 

-^^    •*• 

Model  I :  Analysis  of  Variance  (Fixed  Effects)  Model 

This  model  is  assumed  when  the  researcher  is  concerned  only  with 
the  a  levels  of  factor  a,  the  b  levels  of  factor  6,  and  the  c  levels  of  factor 
c  present  in  the  experiment.  Mathematically,  these  assumptions  are 
summarized  by: 


i=»i 


y=i 
& 


=  0. 


Model  II  :  Component  of  Variance  (Random  Effects)  Model 

This  model  is  assumed  when  the  researcher  is  concerned  with:  (1)  a 
population  of  levels  of  factor  a  of  which  only  a  random  sample  (the  a 
levels)  are  present  in  the  experiment,  (2)  a  population  of  levels  of  factor 
b  of  which  only  a  random  sample  (the  b  levels)  are  present  in  the  experi 
ment,  and  (3)  a  population  of  levels  of  factor  c  of  which  only  a  random 
sample  (the  c  levels)  are  present  in  the  experiment.  Mathematically, 
these  assumptions  are  summarized  as  follows: 

<xz  are  NTD  (0,  <ra) 
/3j  are  NID  (0, 


fc  are  NID  (0,  <rT) 

tf  are  NID  (0,  cra/3) 

*  are  NID  (0,  <r«Y) 

&  are  NID  (0, 

ik  are  NID  (0, 
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Model  III:  Mixed  Model  (a  and  b  Fixed,  c  Random) 

This  model  is  assumed  when  the  researcher  is  concerned  with:  (1) 
only  the  a  levels  of  factor  a  present  in  the  experiment,  (2)  only  the  & 
levels  of  factor  6  present  in  the  experiment,  and  (3)  a  population  of 
levels  of  factor  c  of  which,  only  a  random  sample  (the  c  levels)  are 
present  in  the  experiment.  Mathematically,  these  assumptions  are 
summarized  as  follows: 


**  =  0 
i—  i  y—  i 

Y*  are  NID  (0,  <JT). 
Please  note  that 

c  c 

]C  Or)**,     ]C  (^y)y*,   and 
&=1  A=l 

were  ?zoi  assumed  to  be  0. 


Model  III:  Mixed  Model  (a  Fixed,  b  and  c  Random) 

This  model  is  assumed  when  the  researcher  is  concerned  with:  (1) 
only  the  a  levels  of  factor  a  present  in  the  experiment,  (2)  a  population 
of  levels  of  factor  6  of  which  only  a  random  sample  (the  &  levels)  are 
present  in  the  experiment,  and  (3)  a  population  of  levels  of  factor  c  of 
which  only  a  random  sample  (the  c  levels)  are  present  in  the  experi 
ment.  Mathematically,  these  assumptions  are  summarized  as  follows: 


f  are  NID  (0, 

k  are  NID  (0,  <rr) 

are  NID  (0, 
Please  note  that 

5  c  & 


k9    and 

j—  1 

were  ?^o^  assumed  to  be  0. 

Based  on  the  foregoing  assumptions,  the  expected  mean  squares  are 
derived  and  the  results  presented  in  Table  11.34.  The  proper  7^-tests  for 
various  hypotheses  are  shown  in  Table  11.35. 

Having  outlined  the  methods  of  calculation  associated  with  two-  and 
three-factor  factorials  in  a  completely  randomized  design,  we  are  now 
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ready  to  discuss  the  expected  mean  squares  exhibited  in  Tables  11.31 
and  11.34  and  the  ^-ratios  exhibited  in  Tables  11.32  and  11.35.  Perhaps 
the  best  way  to  approach  this  topic  is  to  talk  about  the  types  of  in 
ferences  that  the  researcher  wishes  to  make.  You  will  recall  that  the 
various  Models  (I,  II,  and  III)  reflect  the  researcher's  desire  to  make 
inferences  about :  (1)  only  the  levels  of  the  factors  present  in  the  experi 
ment,  (2)  populations  of  levels  of  factors  of  which  only  a  random 

TABLE    11.35-F-Ratios  for  Testing  the  Appropriate  Hypotheses  When 

Dealing  With  a  Three-Factor  Factorial  in  a  Completely  Randomized 

Design  (See  Table  11.33  for  the  ANOVA  and  Table  11.34 

for  the  Expected  Mean  Squares) 


Source  of  Variation 

.F-Ratio 

Model  I 

Model  II 

Model  III 
(a  and  b 
Fixed,  c 
Random) 

Model  III 
(a  Fixed, 
b  and  c 
Random) 

Mean  

Treatments  

A/E 
B/E 
C/E 
AB/E 
AC/E 
BC/E 
ABC/E 

no  exact  test 
no  exact  test 
no  exact  test 
AB/ABC 
AC/ABC 
EC/ABC 
ABC/E 

A/AC 
B/BC 
C/E 
AB/ABC 
AC/E 
BC/E 
ABC/E 

no  exact  test 
B/BC 
C/BC 
AB/ABC 
AC/ABC 
BC/E 
ABC/E 

A  

B  

c  

AB   .      .  .     . 

AC  

BC   

ABC 

T^yp^Tirnprital  error 

Total 

sample  (of  levels  from  each  population)  is  present  in  the  experiment* 
and  (3)  a  mixture  of  the  two  preceding  situations,  respectively.  In  each 
of  these  situations,  the  researcher  may  reason  as  follows : 

(1)  When  dealing  with  a  situation  in  which  Model  I  applies,  the 
conclusions  reached  about  any  particular  effect  will  be  un- 
contaminated  by  any  other  effect  since,  by  proper  definition 
of  the  terms  in  the  statistical  model,  the  average  contribution 
of  every  other  effect  can  be  made  equal  to  zero.  Consequently, 
all  ^-values  will  be  calculated  by  forming  the  ratio  of  the 
mean  square  for  the  effect  under  scrutiny  and  the   experi 
mental  error  mean  square.  That  is,  all  effects  are  tested  against 
experimental  error. 

(2)  When  dealing  with  a  situation  in  which  Model  II  applies,  the 
conclusions  reached  about  any  particular  effect  will  be  con 
taminated  by  all  those  effects  which  represent  interactions 
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between  the  effect  under  scrutiny  and  other  effects  present  in 
the  experiment.  This  reflects  the  researcher's  realization  that 
his  conclusions  (inferences)  about  the  effect  under  scrutiny  are 
uncertain  not  only  because  of  the  e's  but  also  because  of  the 
chance  contributions  of  the  randomly  selected  levels  of  any 
factor.  That  is,  a  different  random  sample  (of  levels  of  any 
factor)  might  lead  to  different  conclusions,  and  the  researcher 
attempts  to  incorporate  this  uncertainty  into  his  conclusions 
by  testing  a  particular  effect  against  an  "error"  which  includes 
an  estimate  of  this  additional  variability.  Thus,  the  expected 
mean  squares  will  be  as  shown  in  Tables  11.31  and  11.34  where 
it  is  observed  that  each  expected  mean  square  contains  all  the 
components  of  variance  whose  subscripts  contain  all  the 
letters  representing  the  effect  under  scrutiny.  This  is  the 
mathematical  way  of  expressing  the  ''contamination"  dis 
cussed  above.  Consequently,  the  -P-tests  are  as  specified  in 
Tables  11.32  and  11.35.  [NOTE:  This  illustrates  the  remark 
made  in  Chapter  10,  namely,  "  .  .  .  the  (proper)  experimental 
error  for  testing  a  particular  effect/7  ] 

(3)  When  dealing  with  a  situation  in  which  Model  III  applies,  the 
conclusions  reached  about  any  particular  effect  may  or  may 
not  be  contaminated  by  other  effects.  That  is,  we  have  a 
mixture  of  cases  (1)  and  (2).  To  summarize  what  could  be  a 
rather  involved  discussion,  let  us  state  the  following  rule: 

The  expected  mean  square  for  any  effect  will  contain,  in 
addition  to  its  own  special  term,  all  components  of  variance 
which  represent  interactions  between  the  effect  under 
scrutiny  and  other  effects  whose  levels  were  randomly 
selected.  It  will  not  contain  components  of  variance  repre 
senting  interactions  between  the  effect  under  scrutiny  and 
other  effects  whose  levels  comprise  the  entire  population 
(of  levels)  to  be  investigated. 

The  expected  mean  squares  specified  in  Tables  11.31  and  11.34 
are,  of  course,  simply  results  of  the  above  reasoning  and,  as  a 
consequence,  the  F-tests  are  as  shown  in  Tables  11.32  and 
11.35.  [NOTE:  Again,  this  illustrates  the  remark  made  in 
Chapter  10,  namely,  "  .  .  .  the  (proper)  experimental  error 
for  testing  a  particular  effect/'] 

To  aid  the  researcher  in  writing  out  expected  mean  squares  for  different 
situations,  the  following  sequence  of  steps  is  recommended: 

(1)  Include,  when  applicable,  a  component  of  variance  for  each 
subsampling  stage. 

(2)  Include  a  component  of  variance  representing  experimental 
error. 

(3)  Include  every  component  of  variance  whose  subscripts  include 
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all  the  letters  specifying  the  effect  Math  which  the  expected 
mean  square  is  associated. 

(4)  Insert  coefficients  in  front  of  each  component  of  variance  in 
accordance  with  the  approach  discussed  in  Section  11.5. 

(5)  Delete  from  the  set  specified  in  step  (3),  all  terms  representing 
interactions  between  the  effect  associated  with  the  expected 
mean  square  and  other  effects  whose  levels  were  not  randomly 
selected. 

(6)  For  a  main  effect,  replace  the  component  of  variance  for  that 
effect  by  a  "sum  of  squares  divided  by  the  appropriate  degrees 
of  freedom"  if  the  effect  is  a  "fixed  effect," 

Before  presenting  illustrations  of  the  methods  discussed  in  this 
section,  some  additional  remarks  need  to  be  made.  In  the  interest  of 
economy,  these  are  presented  here  in  the  briefest  form  possible: 

(1)  It  will  have  been  noted  that  the  degrees  of  freedom  for  inter 
action  effects  were   specified  without  any  explanation.   The 
general    rule    is:     For     an     interaction     effect     denoted    by 
ABCD  -  -  -  ,  the  degrees  of  freedom  are 

v  =  (a  —  !)(&  —  l)(c  —  1)(<2  —  1)  -  -  -  . 

(2)  When,  as  in  Table  11.35,  no  exact  tests  of  certain  hypotheses 
are    available,    approximate    tests    can    be    made    following 
Satterthwaite's  procedure.    (See  Section  11.7.)   For  example, 
when  Model  II  was  assumed  in  Table  11.34,  an  approximate 
test  of  H:  o-l  =  Q  is 

p  £*  A/[AB  +  AC  -  ABC}. 

(3)  Conclusions  (inferences)  about  one  factor  in  a  factorial  must 
take  due  cognizance  of  all  interactions  of  this  factor  with  other 
factors.  That  is,  recommendations  about  one  factor  must  give 
consideration  to  the  way  in  which  its  effect  is  influenced  by 
other  factors. 

Example  11.16 

Consider  a  4X3  factorial  in  a  completely  randomized  design  with 
three  experimental  units  per  treatment  combination.  The  data  are 
given  in  Table  11.36.  Proceeding  as  indicated,  the  following  sums  of 
squares  were  calculated: 

53  F2  =  564,389 
Myy  =  (4023)2/36  =  449,570.2 

Tvy  «  Sat,  =  [(306) 2  +  *  *  •  4-  (268)2]/3  —  449,570.2  =  67,160.8 
Evv  =  564,389  —  449,570.2  —  67,160.8  =  47,658,0 

Ayy  —  [(726)2  +  (991)2  +  (1022)2  +  (1284)2]/9  —  449,570.2  =  17,351.7 
Byy  =  [(1624)2  +  (1500)2  +  (899)2]/12  —  449,570.2  =•  25,061.2 
=  67,160.8  -  17,351.7  -  25,061.2  =  24,747.9. 
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These  results  are  summarized  in  ANOVA  form  in  Table  11.37,  where 
the  expected  mean  squares  are  shown  for  each  of  the  four  cases  illus 
trated  in  Table  11.31.  Since  the  data  were  hypothetical,  no  F-tests  will 
be  performed-  Such  tests  and  the  resulting  inferences  will  be  illustrated 
in  succeeding  examples  in  which  actual  experimental  data  will  be 
examined. 

TABLE   11.36-Hypothetical  Data  for  Illustrating  the  ANOVA  for  a 
4X3  Factorial  in  a  Completely  Randomized  Design 


ox 

#2 

a3 

a4 

bi         Z>2         63 

b\         bz        bz 

bi         b%        b$ 

&i         b%         &s 

128 

34 

16 

152 

40 

118 

76 

102 

132 

180 

220 

60 

42 

134 

18 

128 

88 

80 

158 

96 

60 

90 

220 

48 

136 

172 

46 

216 

76 

93 

168 

162 

68 

150 

156 

160 

Example  11 .17 

Consider  an  agronomic  experiment  to  assess  the  effects  of  date  of 
planting  (early  or  late)  and  type  of  fertilizer  (none,  Aero,  Na,  or  K)  on 
the  yield  of  soybeans.  Thirty-two  homogeneous  experimental  plots  were 
available.  The  treatments  were  assigned  to  the  plots  at  random,  subject 
only  to  the  restriction  that  4  plots  be  associated  with  each  of  the  8 
treatment  combinations.  The  data  are  given  in  Table  11.38  and  the 
ANOVA  (assuming  Model  I)  in  Table  11.39. 

TABLE   11. 38- Yields  of  Soybeans  at  the  Agronomy  Farm,  Ames,  Iowa,  1949 

(In  bushels  per  acre) 


Date  of 

Planning 

Fertilizer 

Experimental  Units  Within  Treatments 

1 

2 

3 

4 

Early 

Check 
Aero 

Na 
K 

Check 
Aero 

Na 
K 

28.6 
29.1 
28.4 
29.2 

30.3 
32.7 
30.3 
32.7 

36.8 
29.2 
27.4 
28.2 

32.3 
30.8 
32.7 
31.7 

32.7 
30.6 
26.0 

27.7 

31.6 
31.0 
33.0 
31.8 

32.6 
29.1 
29.3 
32.0 

30.9 
33.8 
33.9 
29.4 

Late  

Assuming  oi  =  0.01,  it  is  seen  that  the  hypothesis  "date  of  planting  has 
no  effect' J  must  be  rejected.  Examination  of  the  mean  yields  indicates 
that  the  later  date  of  planting  is  better  (i.e.,  is  associated  with  higher 
yields).  Of  course,  more  information  is  needed  concerning  the  distinc 
tion  between  "early"  and  "late"  before  explicit  recommendations  can 
be  made.  No  statistically  significant  effects  were  noted  for  either 
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fertilizers  or  for  the  interaction  between  fertilizers  and  date  of  planting. 
(NOTE:  Had  a.  been  chosen  as  0.05,  the  interaction  effect  would  have 
been  significant*  This  illustrates  the  dependence  of  the  inferences  upon 
the  choice  of  significance  level,  a  fact  which  is  sometimes  overlooked, 
or  forgotten,  by  the  analyst.  That  is,  we  must  always  remember  that  a 
statement  about  significance  or  nonsignificance  is  a  direct  function  of 
the  selected  value  of  a..} 

TABLE   11.39-ANOVA  for  Experiment  Described  in  Example  11.17 
(Data  Given  in  Table  11.38) 


Degrees 

Source  of 

of 

Sum  of 

Mean 

Expected  Mean 

JF- 

Variation 

Freedom 

Squares 

Square 

Square 

Ratio 

Mean  

1 

30,368.80 

30,368.80 

Treatments 

Dates  of  planting 

1 

32.00 

32.00 

<r*  +  (16/1)  i  «w 

10.42 

Fertilizers  

3 

16.40 

5.47 

2+rs/,n^*2 

O"     ~j  —    {&/  *J  )    f       M  i 

1.78 

Fertilizers  X  dates 

y-i 

sf              4-v                   2 

of  planting,  .  .  . 

3 

38.40 

12.80 

4.17 

t—  i      y-i 

Experimental  error 

24 

73.74 

3.07 

cr2 

Total 

32 

30,529.34 

Example  11.18 

Consider  a  3X4X3  factorial  in  a  completely  randomized  design  with. 
6  experimental  units  per  treatment  combination.  The  data  are  given 
in  Table  11.40.  Proceeding  as  directed  earlier,  Tables  11.41  through 
11*44  were  obtained  and  the  following  sums  of  squares  calculated: 

23  ^2  =  27,981 
Myy  =  19,703.56 


Tvy 
Eyy 


$bc 

A 

•flyy 

Byy 
Cyy 


Sabc  =  3283.27 
4994.17 
2913.27 
1065.32 

670.83 

941.79 

463.79 

84.93 

1507.69 

38.60 

122.11 

124.36. 
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TABLE   11.4O-Hypothetical  Data  for  Illustrating  the  ANOVA  for  a 
3X4X3  Factorial  in  a  Completely  Randomized  Design 


ax 

a2 

as 

bl     b%     bs    &4 

61   b*   b*   b4 

6,     62     *3    &4 

3 

10 

9 

8 

24 

8 

9 

3 

2 

8   9 

8 

2 

10 

9 

8 

29 

16 

11 

3 

2 

7   5 

3 

8 

10 

2 

8 

27 

16 

15 

8 

2 

15   7 

14 

Ci 

1 

6 

8 

14 

14 

13 

8 

5 

9 

30   9 

2 

7 

8 

9 

6 

18 

10 

2 

16 

14 

7   6 

11 

8 

1 

10 

12 

3 

8 

8 

4 

11 

2   2 

9 

29 

45 

47 

56 

115 

71 

53 

39 

40 

69  38 

47 

4 

12 

3 

8 

22 

7 

16 

2 

2 

2    7 

2 

7 

10 

5 

8 

28 

18 

10 

6 

6 

6   5 

9 

7 

9 

2 

7 

27 

15 

12 

7 

7 

16    1 

13 

Ci 

14 

5 

7 

15 

34 

11 

9 

5 

13 

11    8 

3 

7 

9 

8 

2 

19 

9 

12 

12 

13 

6   6 

12 

7 

6 

12 

3 

3 

15 

8 

4 

12 

3   2 

10 

46 

51 

37 

43 

133 

75 

67 

36 

53 

44  29 

49 

5 

10 

5 

8 

23 

9 

17 

3 

2 

8   6 

3 

9 

10 

27 

8 

28 

16 

11 

7 

8 

9   8 

15 

15 

7 

6 

15 

30 

14 

12 

5 

11 

18   3 

8 

C3 

8 

6 

4 

18 

16 

12 

13 

15 

17 

8   7 

16 

7 

17 

3 

10 

17 

10 

20 

9 

9 

8   6 

17 

3 

2 

10 

5 

3 

7 

8 

6 

11 

7   3 

14 

47 

52 

55 

64 

117 

68 

81 

45 

58 

58  33 

73 

These  results  are  summarized  in  ANOVA  form  in  Table  11.45.  Since 
the  data  were  hypothetical,  no  expected  mean  squares  are  given. 
Neither  are  any  F-tests  performed.  The  reader  is  referred  to  the  prob 
lems  at  the  end  of  the  chapter  for  illustrations  of  various  tests  and  the 
resulting  inferences. 


TABLE  11.41  — 


Table  Formed  From  the  Data  of  Table  11.40 


a 

i 

a>i 

i 

a 

3 

61 

b* 

*3 

64 

61 

b* 

&3 

64 

61 

bz 

63          64 

Cl 

29 

45 

47 

56 

115 

71 

53 

39 

40 

69 

38     47 

c%       .... 

46 

51 

37 

43 

133 

75 

67 

36 

53 

44 

29     49 

47 

.57, 

.5,5 

64 

117 

68 

81 

45 

58 

58 

33      73 
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TABLE   HA2-aXb  Table  Formed  From  the  Data  of  Table  11.40 


&! 122  365  151 

148  214  171 

139  201  100 

163  120  169 

TABLE   11.43-aXc  Table  Formed  From  the  Data  of  Table  11.40 

177  278  194 

177  311  175 

c3 218  I           311  222 

TABLE   11.44~6Xc  Table  Formed  From  the  Data  of  Table  11.40 

&i 

184  185  138       142 

232  170  133       128 

222  178  169       182 


TABLE   11.45-ANOVA  for  Data  of  Table  11.40 


Source  of  Variation 


Degrees  of 
Freedom 


Sum  of  Squares 


Mean  Square 


Mean 
Treatments 

A 

B 

C 

AB 

AC 

BC 

ABC 
Experimental  error , 


2 

3 

2 

6 

4 

6 

12 

180 


19,703.56 

941.79 

463 . 79 

84.93 

1,507.69 

38.60 

122.11 

124.36 

4,994.17 


19,703.56 

470.90 

154.60 

42.46 

251.28 

9.65 

20.35 

10.36 

27.75 


Total 


216 


27,981.00 


Even  though  this  section  is  already  quite  long,  there  are  several 
items  which  need  mentioning  before  we  leave  (for  the  time  being)  the 
subject  of  factorials.  These  items  are:  (1)  general  computational  pro 
cedures  for  factorials  involving  four  or  more  factors,  (2)  special  compu 
tational  methods  for  2n  and  3n  factorials,  (3)  subsampling  in  com 
pletely  randomized  designs  involving  factorial  treatment  combinations, 
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and  (4)  analysis  of  response  curves  associated  with  the  various  main 
effects  and  interactions.  A  brief  discussion  of  each  of  these  will  be 
given  in  the  following  paragraphs. 

The  general  computational  procedure  for  factorials  proceeds  as 
follows.  First  compute  ^Y*,  Myy,  Tyy,  Eyy,  and  any  sums  of  squares 
required  because  of  subsampling.  Then,  to  subdivide  Tw  in,  say,  a 
four-factor  factorial,  form  in  succession  the  four-way  table,  all  three- 
way  tables,  and  all  two-way  tables.  As  each  table  is  formed,  compute 
the  border  totals  as  a  check  on  the  entries  you  have  made  in  the  cells 
of  the  tables.  Then,  starting  with  the  two-way  tables,  calculate  the 
sums  of  squares  for  each  of  the  main  effects  and  for  each  of  the  two- 
factor  interactions.  Then,  proceeding  to  the  three-way  tables,  calculate 
the  sums  of  squares  associated  with  each  of  the  three-factor 
interactions.  And,  finally,  utilizing  the  four-way  table,  the  sum  of 
squares  associated  with  the  four-factor  interaction  may  be  obtained. 
The  extension  to  5,  6,  •  •  -  ,  AT  factors  is  easy.  After  obtaining  the 
basic  sums  of  squares,  form  the  AT-way  table,  all  possible  (N  —  l)-way 
tables,  all  possible  (N  —  2)-way  tables,  .  .  .  ,  all  possible  three-way 
tables,  and  all  possible  two-way  tables  in  the  order  mentioned.  Then 
calculate,  in  the  following  order,  all  main  effect  sums  of  squares,  all 
two-factor  interaction  sums  of  squares,  .  .  .  ,  all  (N  —  l)-f actor  inter 
action  sums  of  squares,  and  the  A^-factor  interaction  sum  of  squares. 

Whenever  all  the  factors  are  at  p  levels,  and  there  are  n  factors,  the 
statistician  refers  to  such  an  arrangement  as  a  pn  factorial.  Of  particular 
interest  are  those  cases  where  p  =  2  or  3.  When  such  cases  arise,  there 
are  available  to  the  research  worker  certain  special  computational 
techniques.  These  are  explained  in  considerable  detail  in  such  references 
as  Yates  (39)  and  Kempthorne  (23),  and  may  be  pursued  by  those 
readers  whose  primary  interest  is  in  computation.  Since  the  methods 
outlined  earlier  in  this  section  are  valid  for  all  cases,  there  seems  little 
reason  to  burden  the  reader  with  a  specialized  technique  at  this  time. 
Accordingly,  we  shall  do  no  more  than  has  already  been  done,  that  is, 
point  out  the  existence  of  the  methods  and  give  pertinent  references 
for  the  use  of  interested  persons. 

Wlten  subsampling  occurs  in  a  completely  randomized  design  in 
volving  factorial  treatment  combinations,  the  methods  of  analysis  are 
simply  a  combination  of  those  given  in  this  section  and  Section  11.4. 
Thus,  no  detailed  discussion  of  computational  techniques  will  be  pre 
sented  at  this  time.  However,  to  illustrate  the  nature  of  the  ANOVA's, 
two  cases  will  be  mentioned.  The  first  of  these  will  involve  only  one 
subsampling  stage,  while  the  second  will  involve  two  stages  of  sub- 
sampling.  If  only  one  stage  of  subsampling  is  involved,  the  appropriate 
statistical  model  (for  a  two-f actor  factorial)  is 

Yijkl  =  M  +  on  +  ft-  +  («£)  ^  +  €i/fc  +  IK**;     *=!,-••,«         (11 . 94) 

J  -  1,  -  •  •  ,  b 
£=!,•••,» 
/  =  1,  -  -  •  ,  p, 
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TABLE  11  46-Abbreviated  ANOVA  for  a  Two-Factor  Factorial  in  a 
Completely  Randomized  Design  Involving  One  Stage  of  Subsamplmg 

(Model  I) 


Source  of 
Variation 


Mean. 

Treatments 


Experimental  error. 
Sampling  error 


Total 


Degrees  of 
Freedom 


-  1 


b  - 


(a  —!)(&  — 

ab(n  —  1) 
abn(p  —  1 


Expected  Mean  Square 


T2  -4-  pcrz  +  pnb  y^  ca/(a  —  1) 
17  i-i 

b  2 

CT2   +  2?<T2  +   PW*     X   &'/(&    ~    *) 
"  3-1 

^  +     <r*  +  pn±,f:  &ftl-/(a 


and  the  ANOVA  would  appear  (in  abbreviated  form)  as  m  Table  11.46. 
(NOTE :  If  only  one  sample  were  obtained  from  each  experimental  unit ; 
e  K  if  one  small  sample  is  taken  from  a  field  plot  to  estimate  the  yield 
of  the  entire  plot,  p  in  Table  11.46  is  set  equal  to  1  and  the  line  for 
"sampling  error"  is  deleted.  However,  if  the  whole  plot  is  harvested, 
the  sampling  error  is  0  and  the  ANOVA  would  be  as  shown  m  Table 
11  30  )  In  the  second  case  to  be  examined,  that  is,  a  case  involving 
two  stages  of  subsampling,  the  appropriate  statistical  model  (for  a  two- 
factor  factorial)  is 

(11.95) 


on 


i  =  1, 


3 
k 

m  = 


i, 
i, 
i, 


•  ,  a 

•  ,  n 

•  ,  P 

•  ,d, 


and  the  ANOVA  would  appear  (in  abbreviated  form)  as  in  Table  11-47. 
The  extension  to  cases  involving  more  than  two  stages  of  subsampling 

should  be  obvious.  -..-,-,     ^  •       ^.v, 

As  indicated  in  Section  11.11,  it  is  often  advisable  to  examine  the 
response  curve  which  summarizes  the  effects  of  the  various  levels  of  a 
factor  upon  the  characteristic  being  measured.  When  our  data  nt  a 
factorial  arrangement,  we  may  find  it  possible  to  examine  response 
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TABLE   11.47-Abbreviated  ANOVA  for  a  Two-Factor  Factorial  in  a 
Completely  Randomized  Design  Involving  Two  Stages  of  Subsampling 

(Model  I) 


Source  of 
Variation             i 

Degrees  of 
Freedom 

Expected  Mean  Square 

jMCean.  

1 

Treatments 
A  

a—  1 

0.2_j-jo.2+^p<3.2_j_^pw5  ]T  «VO—  1) 

B     . 

b—l 

b 

2_|_^.2_|_^      2_|_  jp       y-  /3i/(b—  1) 

AB  

(a~  1)(6—  1) 

a      b 
crl+d^-t-dp^-i-dpn^,  ^Z  (ctpYa/(a—  1)(6  —  1) 

Experimental  error  .  . 
First  stage  sampling 
error  

ab(n-l) 
abn(p  —  1) 

17                                      »-l  y-x 

<T*-h<Z<T*+£2><r2 
2_|_^.2 

Second   stage  sam 
pling  error          .  . 

abnv(d  —  1) 

5   l        if 
o-* 

5 

Total 

abnpd 

curves  associated  with  the  levels  of  2  or  more  factors.  For  example,  if 
we  have  2  factors,  a  and  &,  we  may  subdivide  the  2  sums  of  squares, 
Ayy  and  Byy,  into  parts  designated  as  (AL)VV}  (Ao)y3/,  -  -  •  ,  and  (BxJ)yV) 
(Bo)yy,  -  •  •  ,  respectively.  That  is,  we  may  obtain  the  linear,  quad 
ratic,  -  -  •  ,  sums  of  squares  associated  with  each  of  the  factors  a  and  b. 
However,  since  we  are  now  dealing  with  factorials,  it  is  also  possible 
to  subdivide  the  interaction  sum  of  squares,  (AB}yy.  The  parts  into 
which  (AB}yy  may  be  subdivided  will  be  designated  as  (Ax,BrJ)Vy, 
(AxJEtQ^yy,  (AQBL)yi/,  (AgjBq)^,  •  •  •  .  If  a,  third  factor,  c,  were  present, 
we  would  then  have  such  quantities  as  (Ci^)yyy  (Co)yy,  f  A  -n -^ 


(ALBLCz,) 


yv, 


^  etc.  The  number  of 
possible  subdivisions  is,  of  course,  limited  by  the  number  of  levels  of 
the  various  factors  involved.  Because  we  have  already  devoted  so 
much  time  to  the  discussion  of  factorials  in  a  completely  randomized 
design,  the  details  of  this  technique  (i.e.,  response  curve  analyses  for 
the  various  main  effects  and  interactions)  will  not  be  discussed  here. 
However,  the  technique  will  be  discussed  in  the  following  chapter  in 
connection  with  a  randomized  complete  block  design.  Since  the  method 
is  the  same  regardless  of  the  design  (as  long  as  the  completely  random 
ized  design  has  equal  numbers  of  observations  in  each  category),  the 
person  desiring  the  details  now  can  jump  ahead  and  read  Section  12.12. 
The  reader  will  appreciate,  I  am  certain,  that  the  foregoing  discussion 
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has  only  scratched  the  surface  of  the  subject  of  analyzing  factorials. 
However,  I  also  feel  that  sufficient  material  has  been  given  to  enable 
the  researcher  to  handle  the  most  commonly  occurring  situations. 
Should  more  complex  situations  arise,  reference  to  one  or  more  of  the 
books  listed  at  the  end  of  the  chapter  should  prove  helpful.  If  not,  a 
professional  statistician  should  be  consulted. 

11.13     NONCONFORMITY     TO     ASSUMED     STATISTICAL 
MODELS 

By  now  the  reader  is  well  aware  that  the  usual  assumptions  in  analy 
sis  of  variance  involve  the  concepts  of  additivity,  normality,  homo 
geneity  of  variances,  and  independence  of  the  errors.  However,  up  to 
this  point,  little  has  been  said  about:  (1)  tests  to  assess  the  validity 
of  the  assumptions,  (2)  the  consequences  if  the  assumptions  are  not 
satisfied,  and  (3)  transformations  which,  if  applied  to  the  original  data, 
may  justify  the  use  of  the  assumptions  in  connection  with  the  trans 
formed  data  (i.e.,  the  data  as  they  appear  after  the  transformation  has 
been  applied).  In  this  section  each  of  these  topics  will  be  discussed 
briefly.  For  those  who  wish  more  details,  several  references  are  given. 
In  particular,  three  excellent  expository  articles  are  those  by  Bartlett 
(3),  Cochran  (9),  and  Eisenhart  (18). 

First,  let  us  consider  various  statistical  tests  that  have  been  proposed 
to  check  on  the  validity  of  the  several  assumptions. 

Homogeneity  of  Variances 

In  Section  7.21,  Bartlett's  test  was  given  for  testing  the  hypothesis 
fl":cr?  =  cr|=  •  -  -  =<T|  where  a  random  sample  of  n^  observations  had 
been  taken  from  the  ith  normal  population  (i=  1,  •  -  •  ,  fc) .  Clearly,  this 
test  is  appropriate  for  checking  on  the  homogeneity  of  variances.  How- 
eyer5__Bartlett^s^test  has  been  shown  to  be  quite  sensitive  to_non- 
normality.  Thus,  if  nonnormality  is  suspected  or  has  been  demon 
strated,  the  test  should  be  modified  as  suggested  by  Box  and  Anderson 
(5).  For  a  discussion  of  other  tests,  the  reader  is  referred  to  Anscombe 
and  Tukey  (2),  Box  and  Anderson  (5),  David  (12),  and  Dixon  and 
Massey  (14). 

Normality 

To  check  on  the  assumption  of  normality,  one  can  use  the  chi-square 
test  of  goodness  of  fit  given  in  Section  7.15.  An  alternative,  and  perhaps 
preferred,  method  is  the  Kolniogorov-SniiiTLov^^tiesJL  -discussed  in 
Chapter  15.  For  those  who  are  satisfied  with  a  less  objective  approach, 
the  data  (or  the  residuals)  may  be  plotted  on  normal  probability  paper 
and  a  subjective  judgment  rendered. 

Additivity 

When  the  assumption  of  additivity  is  questioned,  the  problem  is 
somewhat  more  involved.  This  is  so  because  there  are  three  major 
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causes  of  nonadditivity,  namely,  (1)  the  true  effects  may  be  multi 
plicative,  (2)  interactions  may  exist  but  terms  representing  such  effects 
have  not  been  included  in  the  assumed  model,  and  (3)  aberrant  obser 
vations  may  be  present.  If  the  experimental  design  is  such  that  inter 
action  effects  may  be  isolated,  the  methods  of  the  preceding  section 
may  be  used  to  check  on  (2).  However,  if  this  is  not  possible,  the 
researcher  may  use  the  more  general  tests  suggested  by  Tukey  (32,  37) 
and  by  Ward  and  Dick  (38)  .  Rather  than  give  the  details  of  these  tests, 
we  refer  the  reader  to  the  original  publications.  If  access  to  these  pub 
lications  is  not  possible,  perhaps  the  illustrations  in  Snedecor  (31)  and 
Hamaker  (21)  will  suffice. 

I  ndepend  ence 

The  assumption  of  independence  or,  granting  normality,  of  un- 
correlated  errors  is  a  crucial  assumption  and  its  importance  should  not 
be  overlooked.  Of  course,  by  utilization  of  the  device  of  randomization, 
the  researcher  can  do  his  best  to  see  that  the  correlation  between  errors 
will  not  continually  favor  (or  hinder)  any  particular  treatment.  If  one 
wishes  to  test  for  randomness,  methods  are  available.  However,  since 
these  will  be  discussed  in  Chapters  15  and  16,  no  details  will  be  given 
at  this  time.  The  interested  reader  may  jump  ahead  to  the  appropriate 
sections.  (NOTE:  The  procedure  discussed  in  Section  11.6  may  also 
be  helpful  in  this  situation.) 

In  general,  the__consequences  are  not  serious  when  the  assumptions 
madeira  connection,  with  analyses  of  variance  are  not  strictly  satisfied. 
That  is,  moderate  departures  from  the  conditions  specified  by  the 
assumptions  need  not  alarm  us.  For  example,  minor  deviations  from 
normality  and/or  some  degree  of  heteroschedasticity  (lack  of  homo 
geneity  of  variances)  will  have  little  effect  on  the  usual  tests  and  the 
resulting  inferences.  In  summary,  the  analysis  of  variance  technique 
i^jpiite^obustr^|i(i,thus  the  researcher  can  rely  on,  its  doing  a  good  job. 
und^r  JDQSk  JSJECIITQ  qt  *vn  f*.^  However,  since  trouble  can  arise  because  of 
failure  of  the  data  to  conform  to  the  assumptions,  ways  of  handling 
such  situations  must  be  examined. 

When  some  action  is  needed  to  make  tlxe  data  conf  ormjto  the  jusual 

approach  is  to  transform  the  original  data 


inlsuchlTway  that  the  transformed  data  will  meet  the  conditions  specie 
fied  by  the  assumptions.  For  example,  if  the  true  effects  are  multipli 
cative  instead  of  additive,  it  is  customary  to  take  logarithms  and  thus 
change,  for  instance, 


Y  =  jjLcttffrs  (11.96) 

into 

Y'  =  log  Y  =  log  M  +  log  en  +  log  fy  +  log  €„.          (11.97) 

Fortunately,  in  most  cases,  one  transformation  will  suffice.  That  is,  it 
is  usually  not  necessary  to  make  a  series  of  transformations,  each  to 
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correct  a  separate  "deficiency "  in  the  original  data.  The  reason  for 
this  fortunate  state  of  affairs  is  that,  in  general,  the  utilization  of  a 
transformation  to  correct  one  particular  deficiency  (say,  nonadditivity) 
will  also  help  with  respect  to  another  deficiency  (say,  nonnormality) . 
With  this  in  mind,  the  more  common  transformations  are  summarized 
in  Table  11.48.  Further  details  may  be  found  in  Bartlett  (3)  and 
Tukey  (32). 

TABLE   11.48-Some  Common  Transformations 


Transformation 

Conditions  Leading  to 
Its  Application 

Name 

Equation 

Logarithmic  .  .  .  . 

F'  =  log  Y 

1.  The  true  effects  are  multiplicative  (or 
proportional)  . 
or 
2.   The  standard  deviation  is  proportion 
al  to  the  mean. 

Square  root  .... 

F'=VF 
or 

The  variance  is  proportional  to   the 
mean  (e.g.,  when  the  original  data  are 
samples  from  a  Poisson  distribution). 

F'-VF+l 

Arcsine           .  .  . 

F'  =  arcslne  Vp 

The    variance    is    proportional    to    p, 
(1  —  ju)  as,  for  example,  when  the  orig 
inal  data  are  samples   (expressed  as 
proportions    or   relative    frequencies) 
from  binomial  populations. 

Reciprocal  

F'=1/F 

The    standard    deviation    is    propor 
tional  to  the  square  of  the  mean. 

Before  leaving  the  subject  matter  of  this  section,  one  other  technique 
for  handling  heterogeneous  variances  should  be  mentioned.  This  tech 
nique  is  as  follows:  Partition  the  experimental  error  sum  of  squares  in 
correspondence  with  any  partitioning  of  the  treatment  sum  of  squares. 
However,  this  technique,  valid  though  it  may  be,  is  seldom  employed 
because;  (1)  it  is  difficult  and  time-consuming  to  perform  and  (2)  each 
portion  of  Evy  will  usually  possess  a  very  small  number  of  degrees  of 
freedom  so  that  the  subsequent  F-tests  will  be  of  little  value  (i.e.,  they 
will  not  be  very  powerful  or  discriminating  tests).  Because  this  tech 
nique  is  used  so  rarely,  no  further  discussion  will  be  given  at  this  time. 
However,  an  example  of  subdividing  the  experimental  error  sum  of 
squares  will  be  presented  in  the  next  chapter. 

11.14     THE    RELATION     BETWEEN    ANALYSIS    OF    VARI 
ANCE  AND   REGRESSION   ANALYSIS 

Perhaps  the  most  concise  statement  that  can  be  made  concerning  the 
relation  between  analysis  of  variance  and  regression  analysis  is  the 
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following:  Analysis  of  variance  and  regression  analysis  are  essentially 
the  same.  Why,  then,  have  we  spent  so  much  time  (and  we  are  not 
through  yet)  discussing  analysis  of  variance  as  a  separate  topic?  The 
answer  is :  Because  there  are  many  cases  (based  on  specific  conditions) 
that  are  more  easily  explained  using  the  methods  of  this  and  succeeding 
chapters  than  those  given  in  Chapter  8. 

Because  of  the  complexity  of  the  topic,  the  general  equivalence  of  the 
two  methods  (i.e.,  analysis  of  variance  and  regression  analysis)  will  not 
be  discussed  in  this  book.  The  interested  reader  is  referred  to  Graybill 
(20)  and  Kempt  home  (23)  for  a  general  discussion  of  the  basic  theory, 
and  to  Chew  (8)  for  some  illustrative  examples. 

11.15      PRESENTATION   OF    RESULTS 

Even  though  an  ANOVA  table  is  very  convenient  for  summarizing 
certain  aspects  of  the  analysis  of  a  set  of  data,  it  suffers  from  a  rather 
serious  deficiency,  namely,  that  it  tends  to  overemphasize  tests  of 
hypotheses  and  underemphasize  estimation.  Since  estimation  is  the 
more  important  of  these  two  aspects  of  statistical  inference,  this  could 
be  serious  if  steps  are  not  taken  to  remedy  the  situation.  Two  steps  that 
can  be  taken  to  improve  matters  are:  (1)  always  accompany  an 
ANOVA  table  with  tables  of  means,  together  with  their  standard  er 
rors,  and  (2)  "whenever  possible,  portray  the  results  in  graphical  form. 
If  these  two  steps  are  taken  and  if  a  readable  report  is  prepared,  the 
results  of  your  research  will  be  more  easily  understood  and  appreciated. 

Example  11.19 

Re-examination  of  Example  11.3  will  show  that  the  means  were 
given  in  Table  11.9,  the  ANOVA  in  Table  11.10,  and  the  standard  error 
of  the  mean  in  the  discussion.  Actually,  the  standard  error,  which  was 
the  same  for  each  mean  because  of  the  equal  sample  sizes,  might  better 
have  been  included  in  Table  11.9. 

Example  11.20 

Re-examination  of  Example  11.4  will  show  that  the  suggestion  made 
in  Example  11.19  was  adopted  in  that  case.  That  is,  the  standard  errors 
were  presented  along  with  the  means  to  which  they  applied. 

Example  11.21 

Re-examination  of  Example  11.6  will  show  that  the  ANOVA  was 
given  in  Table  11.18  and  the  standard  error  of  a  treatment  mean  was 
included  in  the  discussion.  However,  the  treatment  means  were  not 
explicitly  exhibited  although  they  could  easily  have  been  obtained.  Had 
a  complete  report  of  the  research  been  prepared,  this  deficiency  would 
have  been  noted  and  removed. 

Example  11.22 

Re-examination  of  Examples  11.11  and  11.12  will  show  that  standard 
errors  were  (implicitly)  found  for  each  of  the  selected  contrasts.  As 
noted  in  the  discussion  of  Example  11.12,  the  point  and  interval  esti- 
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mates  of  the  true  effects  of  the  contrasts  could  then  be  calculated.  These 
would,  of  course,  be  included  in  the  research  report. 

Example  11.23 

Re-examination  of  Example  11.15  will  indicate  that  any  research 
report  concerning  this  experiment  would  have  benefited  by  a  graph 
showing  the  treatment  means  (average  yields)  as  a  function  of  the 
amount  of  fertilizer  applied  to  the  experimental  plots.  It  is  suggested 
that  the  reader  plot  these  means  and  examine  the  graph  in  connection 
with  the  recommendations  made  in  Example  11.15. 

Example  11.24 

Re-examination  of  Example  11.17  will  reveal  that  the  treatment 
means  were  not  given.  Since  they  are  pertinent  to  the  conclusions,  we 
give  them  in  Table  11.49.  The  standard  errors  of  the  treatment  means 

TABLE   11.49— Treatment  Means  for  the  Experiment  Discussed  in  Example 
11.17  (Data  in  Table  11.38;  ANOVA  in  Table  11.39) 


Date  of 

Planting 

Fertilizer 

Early 

Late 

Average 

Check    

32.68  (0.88) 

31.28  (0.88) 

31.98  (0.62) 

Aero  

29.50  (0.88) 

32.08  (0.88) 

30.79  (0.62) 

Na  

27.78  (0.88) 

32  .  48  (0  .  88) 

30.12  (0.62) 

K  

29.28  (0.88) 

31.40  (0.88) 

30.34(0.62) 

Average 

29.81  (0.44) 

31.81  (0.44) 

30.81 

The  figures  in  parentheses  in  tlie  table  are  the  standard  errors  of  the  means  to  which 
they  are  appended. 

shown  in  Table  11.49  were  calculated  by  taking  the  square  roots  of  the 
folio  wing  estimated  variances: 

V(Yi)    =  V  (date  of  planting  mean)  =  3.07/16  =  0.1919 

V(Y^    =-  V  (fertilizer  mean)  =  3.07/8  =  0.3838 

V(Yii)  =  V  (date  of  planting  X  fertilizer  mean)  =*  3.07/4  =  0.7675. 

A  graphical  presentation  of  the  means  is  given  in  Figure  11.1  where, 
of  course,  the  reader  must  realize  that  the  slopes  of  the  lines  are  a  direct 
reflection  of  the  scales  adopted.  However,  since  our  main  use  of  the 
graph  will  be  in  the  interpretation  of  the  interaction,  this  will  not  matter, 
for  we  shall  be  concerned  only  with  the  slopes  of  the  lines  relative  to 
one  another.  A  study  of  Figure  11.1  will  confirm  the  conclusions 
reached  in  Example  11.17,  namely:  (1)  the  late  date  of  planting  is 
apparently  better  than  the  early  date  of  planting,  (2)  there  is  little 
difference  among  the  main  effects  of  the  four  fertilizers,  and  (3)  there 
is  some  indication  of  a  possible  interaction.  (NOTE:  This  last  conclu 
sion  is  suggested  by  the  lack  of  "parallelism"  of  the  plotted  lines.) 

In  addition  to  the  remarks  made  in  the  first  paragraph  of  this  section 
and  illustrated  in  Examples  11.19  through  11.24,  the  reader  should 
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FIG.    1  1 .1—  Graphical    representation    of   the   mean 
yields    given    in    Table    1  1 .49. 


realize  that  many  experiments  are  conducted  and  analyses  of  variance 
performed  only  to  estimate  components  of  variance.  Important  as  this 
topic  is,  it  is  felt  that  the  discussion  given  earlier  in  the  chapter  will 
prove  sufficient  for  most  applications.  Should  further  details  be  desired, 
it  is  suggested  that  a  professional  statistician  be  consulted. 

One  other  topic  should  be  mentioned  in  connection  with  the  presen 
tation  of  results.  This  topic  is  concerned  with  the  general  way  in  which 
ANOVA's  are  commonly  presented.  Two  customs  have  become  quite 
firmly  established  over  the  years  and  they  are  as  follows: 

1.  (a)   If  an  .P-ratio  exceeds  the  95  per  cent  point  but  does  not 

exceed  the  99  per  cent  point,  the  F-ratio  (or  the  mean 
square  for  the  effect  being  tested)  is  tagged  with  a  single 
asterisk  (*). 

(b)  If  an  F-ratio  exceeds  the  99  per  cent  point,  the  F-ratio 
(or  the  mean  square  for  the  effect  being  tested)  is  tagged 
with  a  double  asterisk  (**) . 

2.  If  space  is  at  a  premium,  only  an  abbreviated  A1STOVA  will  be 
presented.  When  this  is  done,  it  is  customary  to  include  only 
the  columns  for:  (1)  sources  of  variation,  (2)  degrees  of  freedom, 
and  (3)  mean  squares. 

Incidentally,  when  the  asterisk  convention  is  used,  it  is  good  practice 
to  define  the  symbols  at  the  bottom  of  every  ANOVA  table  by  use  of 
the  following  footnotes: 

*  Significant  at  a  =  0.05. 
**  Significant  at  a.  =  0.01. 
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The  use  of  these  customs  will  be  illustrated  in  succeeding  chapters. 

If  any  words  can  be  put  together  to  summarize  the  implications  of 
this  section,  they  are  as  follows:  Do  not  forget  the  reader.  Remember,  you 
are  writing  not  for  yourself  but  for  others.  Anything  you  can  do  to  make 
your  assumptions,  procedures,  results,  analyses,  and  conclusions  more 
understandable  will  add  to  the  value  of  your  research. 

Problems 

11.1  What  are  the  proper  objectives  of  analyses  of  variance  (using  experi 
mental  or  survey  data) ;  that  is,  for  what  purposes  may  we  properly 
use  analyses  of  variance? 

11.2  Forty  technicians  were  available  to  investigate  5  methods  of  deter 
mining  the  iron  content  of  a  certain  chemical  mixture.  Eight  of  the 
technicians  used  method  No.  1,  8  used  No.  2,  and  so  on.  The  assign 
ment  of  technicians  to  methods  was  performed  in  a  random  manner. 
Each  technician  made  only  one  determination.  Given  that:  (1)  the 
total  of  the  40  observations  was  80,  (2)  the  among  methods  mean 
square  was  6,  and  (3)  the  pooled  variance  among  technicians  within 
methods  was  8,  fill  in  the  following  ANOVA  table.  (NOTE:  omit  the 
spaces  marked  X.} 


Source  of  Variation 

Degrees 
of 
Freedom 

Sum 
of 
Squares 

Mean 
Square 

Expected 
Mean 
Square 

Mean  

X 

Among  methods  ,      T  ,  .  . 

Among  technicians 
within  methods  

Total 

X 

X 

11.3        Given  the  following  abbreviated  ANOVA: 


Source  of  Variation 

Degrees 
of  Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean  Square 

Among  treatments  

4 

244 

61 

<r2  +  7  23  -£/4 

Among  experimental  units 
within  treatments  

30 

270 

9 

t~i 
cr2 

(ct)   Write  out  the  appropriate  model. 

(&)   State  the  null  hypothesis,  both  in  words  and  symbolically,  that 

the  experiment  was  probably  designed  to  test, 
(c)    Test  the  hypothesis  given  in  the  answer  to  (&)  using  a  probability 

of  Type  I  error  equal  to  .05. 
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11.4  A  process  is  designed  to  produce  a  fishline  that  will  have  a  "15-lb.- 
test"  rating.  The  braided  line  may  be  treated  with  4  different  water- 
proofings.  The  hypothesis  is  that  the  4  treatments  have  the  same, 
if  any,  effect  on  the  test  rating  of  the  cord.  Twenty  samples  of  each 
type  of  treated  cord  are  tested  for  breaking  strength.  Assuming  that 
analysis  of  variance  is  a  valid  technique  to  use  in  this  case,  set  up  the 
appropriate  table  showing  the  proper  subdivision  of  the  degrees  of 
freedom.  Discuss  any  further  analyses  that  might  be  useful  in  investi 
gating  the  treatments. 

11.5  It  is  desired  to  test  10  different  baking  temperatures  when  we  use  a 
standard  cake  mix.  Fifty  sample  batches  of  mix  are  prepared,  and  5 
are  assigned  at  random  to  each  of  the  10  temperatures.  Six  judges 
score  the  cakes,  and  the  average  score  is  recorded  for  each  cake.  Give 
the  proper  subdivision  of  the  degrees  of  freedom,  and  write  out  the 
mathematical  model  assumed.  State  the  hypothesis  to  be  tested.  Dis 
cuss  and  evaluate  the  method  of  analysis. 

11.6  Four  methods  of  performing  a  certain  operation  have  been  tried  and 
we  have  10  observations  for  each  method.  The  mean  productivities 
under  each  method  are  60,  70,  80,  and  90,  respectively.  Not  having 
the  original  data  from  which  to  calculate  the  sums  of  squares,  we 
assume  that  the  coefficient  of  variation  (square  root  of  the  pooled 
estimate  of  az  divided  by  the  average  of  all  observations)  is  0.1.  On 
this  assumption,  test  the  hypothesis  that  the  "method  population 
means"  are  equal. 

11.7  Given  that  the  means  of  10  individuals  in  each  of  5  groups  are  30,  32, 
34,  36,  and  38,  and  that  the  variance  of  a  group  mean  is  8,  compute 
the  analysis  of  variance. 

11.8  An  investigation  to  study  the  variation  in  average  daily  gains  made 
by  pigs  among  and  within  litters  when  fed  the  same  ration  gave  the 
following  results: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Among  litters  

29 

0.0576 

Among  pigs  in  the  same  litter  

180 

0.0144 

How  would  you  use  this  information  to  design  experiments  to  test  the 
effects  of  different  rations  on  average  daily  gains? 

11.9  Community  X  and  community  Y  are  two  neighboring  small  towns. 
Community  X  is  supplied  with  electricity  by  a  private  power  com 
pany,  while  community  Y  operates  a  municipally  owned  but  ineffi 
cient  high-cost  power  plant.  As  a  result,  cost  of  electricity  to  home- 
users  is  higher  in  community  Y  than  in  community  X]  for  example, 
the  charge  for  the  first  50  watts  is  $3.00  in  X  and  $4.50  in  Y.  A  ran 
dom  sample  of  household  meter  readings  for  the  same  month  was 
taken  in  each  community.  The  following  values  were  obtained: 
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11.10 


X  Sample 
(16  Observations) 
(Kw-hours  used) 

F  Sample 
(21  Observations) 
(Kw-hours  used) 

28                      16 

6                      12 

12                     22 

36                      18 

14                     28 

24                      18 

4                       4 

58                      16 

16                     22 

60                      22 

28                       4 

6                      14 

30                     34 

14                      16 

76                     30 

54                     26 

22                      44 

16                      58 

18 

Analyze  these  data  in  two  ways  : 

(a)  Compare  home  consumption  of  electricity  in  the  two  communities 
by  means  of  the  comparison  of  two  groups  using  "Student's" 


(&)  Prepare  an  analysis  of  variance  of  these  two  samples. 
It  is  suspected  that  five  filling  machines  in  a   certain  plant  are 
filling  cans  to  different  levels.  Random  samples  of  the  production 
from  each  machine  were  taken,  with  the  following  results: 


Machine 

A 

B 

C 

D 

E 

11.95 

12.18 

12.16 

12.25 

12.10 

12.00 

12.11 

12.15 

12.30 

12.04 

12.25 

12.08 

12.10 

12.02 

12.10 

12.02 

Analyze  the  data  and  state  your  conclusions. 

11.11  The  amount  of  carbon  used  in  the  manufacture  of  steel  is  assumed  to 
have  an  effect  on  the  tensile  strength  of  the  steel.  Given  the  following 
data,  perform  the  appropriate  analysis  and  interpret  your  results. 
The  tensile  strengths  of  six  specimens  of  steel  for  each  of  three  dif 
ferent  percentages  of  carbon  are  shown.  (The  data  have  been  coded 
for  easy  calculation.) 
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Percentage  of  Carbon 


0.10 


0.20 


0.30 


23 

42 

47 

36 

26 

43 

31 

47 

43 

33 

34 

39 

31 

37 

42 

31 

31 

35 

11.12  A  public  utility  company  has  a  stock  of  voltmeters  which  are  used 
interchangeably  by  the  employees.  The  question  arises  as  to  whether 
all  the  voltmeters  are  homogeneous.  Since  it  would  be  too  expensive 
to  check  all  the  meters,  a  random  sample  of  6  meters  is  obtained  and 
all  6  are  read  three  times  while  being  subjected  to  a  constant  voltage. 
The  following  data,  expressed  as  deviations  from  the  test  voltage, 
were  recorded.  Analyze  and  interpret. 

Meter 


0.95 

0.33 

—  2.15 

—  1.20 

1.80 

—  1.05 

1.06 

—  1.46 

1.70 

0.62 

0.88 

—  0.65 

1.96 

0.20 

0.48 

1.50 

0.20 

0.80 

11.13  An  experiment  had  for  its  objective  the  evaluation  of  variance  com 
ponents  for  the  variation  in  ascorbic  acid  concentration  (mg.  per 
100  g.)  in  turnip  greens.  Two  leaves  were  taken  from  near  the  center 
of  each  of  5  plants.  Ascorbic  acid  concentration  was  determined  for 
each  leaf.  This  was  repeated  on  each  of  6  days,  a  new  selection  of 
plants  being  obtained  each  day.  The  following  data  were  collected: 


Day 

Leaf 

Plant 

1 

2 

3 

4 

5 

1 

A 

9.1 

7.3 

7.3 

10.7 

7.7 

B 

7.3 

9.0 

8.9 

12.7 

9.4 

2 

A 

12.6 

9.1 

10.9 

8.0 

8.9 

B 

14.5 

10.8 

12.8 

9.8 

10.7 

3 

A 

7.3 

6.6 

5.2 

5.3 

6.7 

B 

9.0 

8.4 

6.9 

6.8 

8.3 

4 

A 

6.0 

8.0 

6.8 

9.1 

8.4 

B 

7.4 

9.7 

8.6 

11.2 

10.3 

5 

A 

10.8 

9.3 

7.3 

9.3 

10.4 

B 

12.5 

11.0 

8.9 

11.2 

12.0 

6 

A 

10.6 

10.9 

10.4 

13.1 

7.7 

B 

12.3 

12.8 

12.1 

14.6 

9.4 
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11.14 


Plants,  days,  and  leaves  are  to  be  considered  as  random  variables 
(there  might  be  some  question  about  days).  Calculate  the  analysis  of 
variance  and  evaluate  the  variance  components  for  leaves  of  the  same 
plant,  plants  of  the  same  day,  and  days. 
Suppose  we  have  the  mathematical  model 

Yak  «  M  +  n  +  ei3  +  Si,*         i  =  1,  2,  3,  4 

j  -  1,  2 
k  «  1,  2 

where  n  is  the  true  effect  of  the  fth  treatment,  €#  is  the  effect  of  the 
jth  experimental  unit  subjected  to  the  ith  treatment,  and  5t-y&  is  the 
Ath  determination  on  the  (ij)  th  experimental  unit.  We  wish  to  test  the 
hypothesis  H:ri  =  0  for  all  i.  The  following  values  are  known: 

3Ti  =     8  .En  «  3  £12  =  5 

T2  =     7  JS2i  =  3  £22  =  4 

T3  =  10  £31  =  2  £32  =  8 

2%  =     7  ^41  =  5  ^42  =  2 


11.15 


and  ]Cy2  =  18.  Complete  the  appropriate  analysis  of  variance,  test 
the  hypothesis,  and  interpret  your  results. 

Given  the  following  abbreviated  ANOVA  of  data  collected  from  an 
experiment  involving  6  treatments,  10  experimental  units  per  treat 
ment,  and  3  determinations  per  experimental  unit  : 


Degrees  of 

Mean 

Source  of  Variation 

Freedom 

Square 

Expected 

Mean  Square 

Treatments  ,  ,  ,  . 

5 

12,489 

O"g    —f~    3o" 

"+^±r? 

5  tZ 

Exp.  units  within  treatments.  .  .  . 

54 

3,339 

2      _!_   1 
<T$      -j-   OCT 

2 

Det   per  experimental  unit 

120 

627 

2 
(TK 

11.16 


(a)  Write  out  the  model  assumed,  stating  explicitly  what  each  term 
represents. 

(6)  Test  the  hypothesis  that  the  6  treatments  have  the  same  popula 
tion  mean. 

(c)  Compute  the  variance  of  a  treatment  mean. 

(d)  Given  that  th.6  sample  mean  for  treatment  No.  3  is  193.7,  com 
pute  and  interpret  the  95  per  cent  confidence  interval  for  esti 
mating  the  true  population  mean  of  treatment  No.  3. 

(e)  Assuming  that  the  estimates  of  the  components  of  variance  would 
remain  unchanged,  would  it  be  more  or  less  efficient  to  use  9  ex 
perimental  units  per  treatment  and  4  determinations  per  experi 
mental  unit?  Show  all  calculations  necessary  to  support  your  an 
swer.  What  is  the  gain  or  loss  in  information? 

We  conducted  a  completely  randomized  experiment  to  study  some 
chemical  characteristics  of  5  varieties  of  oats.  We  assigned  each  variety 
at  random  to  6  plots,  making  a  total  of  30  plots.  Instead  of  harvesting 
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11.17 


the  entire  plot,  we  selected  at  random  8  3-by-3-foot  samples  from 
each  plot.  For  each  sample  we  made  3  chemical  determinations. 
Indicate  the  proper  complete  subdivision  of  the  total  degrees  of  free 
dom.  Give  the  expected  mean  square  for  each  source  of  variation. 
Indicate  the  proper  F-test  to  test  the  hypothesis  that  the  population 
means  for  the  5  varieties  are  equal. 
Given  the  following  abbreviated  ANOVA: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Groups  

3 

600 

Experimental  units  within  groups  

36 

120 

[Determinations  per  experimental  unit.  .  . 

80 

12 

(a)  Give  the  expected  mean  squares,  assuming  that  we  are  interested 
in  just  these  groups  but  that  experimental  units  and  determina 
tions  are  random  variables. 

(6)  Test  the  hypothesis  that  the  group  population  means  are  equal. 
Interpret  your  result. 

(c)    Compute  the  variance  of  a  group  mean  (per  determination) . 
11.18      Given  the  following  abbreviated  ANOVA: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Among  treatments     

4 

20 

Experimental  units  within  treatments  .  .  . 
Determinations  per  experimental  unit.  .  .  . 

15 
20 

15 
4 

Obtain  estimates  of  all  the  components  of  variance,  and  interpret 
each  in  terms  of  the  model  y;yfc==M+'7~i+£;j+Siyfc  stating  explicitly 
all  assumptions  that  you  make. 
11.19      Given  the  following  abbreviated  ANOVA: 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 

Square 

Expected  Mean 
Square 

Treatments  

3 

1800 

600 

0-!+3o-2+30ov 

Experimental  units 
within  treatments  .... 
Determinations  per 
experimental  unit  .... 

36 
80 

3600 
960 

100 
12 

<r2s+3<r2 
<r! 

(a)   Compute  the  variance  of  a  treatment  mean. 

(6)   Test  the  null  hypotheses  jff:er?  =  0,  and  interpret  your  answer. 

(c)  The  sample  mean  of  treatment  No.  1  is  given  to  be  80.  Compute 
a  95  per  cent  confidence  interval  for  estimating  the  true  popula 
tion  mean  of -treatment  No,  1. 
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11.20      Given,  the  following  abbreviated  ANOVA: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Treatments  

4 

960 

Experimental  units  within  treatments  
determinations  per  experimental  unit  

35 
40 

320 
20 

11.21 


(a)   Test  the  hypothesis  that  the  population  treatment  means  are  all 

equal.  Interpret  your  answer. 
(6)    Give  the  expected  mean  squares  in  the  above  analysis  of  variance. 

(c)  Compute  the  variance  of  a  treatment  mean, 

(d)  Estimate  the  gain  or  loss  in  information  if  the  above  experiment 
were  to  be  repeated  with  10  experimental  units  per  treatment  and 
a   single   determination   per   experimental   unit.    State    all   your 
assumptions. 

Given  the  following  abbreviated  ANOVA: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Among  treatments    

9 

570 

Experimental  units  within  treatments  
Among    determinations    on    same    experi 
mental  unit  

90 
200 

190 
10 

(a) 


(6) 

GO 


11.22 


11.23 


11.24 


Compute  F  to  test  the  hypothesis  that  the  10  treatments  have 
the  same  true  effect. 

Compute  the  variance  of  a  treatment  mean  per  determination  in 
the  above  experiment. 

Assuming  that  the  estimates  of  the  components  of  variance 
would  not  change,  estimate  the  gain  or  loss  in  information  in 
estimating  the  treatment  means  if  20  experimental  units  per 
treatment  were  selected  with  a  single  determination  on  each 
experimental  unit  in  repeating  the  experiment. 

The  cities  and  towns  of  Arizona  have  been  allocated  to  5  strata  (or 
groups)  according  to  population.  In  each  stratum  we  select  at  random 
10  cities  (or  towns) ;  in  each  of  these  cities  we  select  at  random  4 
blocks;  and  in  each  of  these  blocks  we  select  at  random  2  households. 
Indicate  the  proper  subdivision  of  the  degrees  of  freedom  for  a  com 
plete  analysis  of  variance  of  some  item  such  as  the  average  income 
of  the  head  of  each  household. 

Set  up  the  analysis  of  variance  table  and  show  the  degrees  of  freedom 
for  the  following  experiment:  Six  spray  treatments  are  applied  com 
pletely  at  random  in  an  orchard  of  100  trees  (all  being  used).  Each 
treatment  is  applied  to  sets  of  2  trees;  then  the  yield  of  each  tree  is 
estimated  by  obtaining  4  samples  around  the  perimeter.  Note  that 
all  but  2  treatments  contain  8  sets;  the  remaining  2  treatments  con 
tain  9  sets.  Show  the  expected  mean  squares. 

The  following  abbreviated  analysis  of  variance  was  prepared  from 
chemical  determinations  made  on  samples  of  a  legume  hay.  The  hay 
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samples  were  obtained  from  a  completely  randomized  experiment 
involving  16  treatments. 

ANALYSIS  OP  VARIANCE  OF  CHEMICAL  DETERMINATIONS 

SAMPLES 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Treatments  

15 

550 

Plots  with,  same  treatment 

220 

Samples  from  plots  treated  alike.  .  . 

256 

20 

11.25 


11.26 


(a) 


State  a  suitable  hypothesis  about  these  treatments,  and  make  the 
proper  test.  The  experimenter  wished  to  reject  the  hypothesis 
only  under  the  condition  of  a  1  per  cent  chance  of  a  Type  I  error. 
What  is  your  conclusion? 

Indicate  the  function  of  replication  in  this  experiment  for  study 
ing  the  effects  of  various  treatments  on  a  legume  hay. 
How  might  the  replication  and  sampling  procedure  for  this  ex 
periment  be  changed  in  order  to  increase  replication  without 
changing  the  total  number  of  samples  to  be  analyzed? 
Estimate  the  possible  gain  in  relative  efficiency  for  your  proposed 
change  in  the  experiment. 

(e)  What  assumption  is  required  for  making  this  calculation? 
In  an  effort  to  develop  objective  methods  of  estimating  the  yield  of 
corn,  an  experimental  survey  was  conducted  in  a  district  of  central 
Iowa.  A  random  selection  of  fields  was  made,  and  within  those  fields 
2  sampling  units  (consisting  of  10  hills  each)  were  selected  at  random 
and  the  grain  yield  determined  by  harvesting  and  weighing.  The 
analysis  of  variance  (on  a  10-hill  s.u.  basis)  is  as  follows: 


(6) 
(c) 

(d) 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 

Square 

Fields  

47 

2098.7 

44.65 

S.u.'s  within  fields.  .  .  . 

48 

554.5 

11.55 

How  much  information  would  have  been  lost  if  only  one  s.u.  per  field 
had  been  taken? 

IData  from  a  sample  survey  of  farms  in  the  Midwest  were  to  be  sum 
marized  by  means  of  the  analysis  of  variance.  Eight  types  of  farming 
areas  were  included  in  the  study,  and  within  each  area  5  counties 
were  selected  at  random.  Within  each  of  the  chosen  counties,  20 
farms  were  selected  at  random  and  farm  management  records  taken 
for  each.  A  partial  list  of  the  summary  calculations  was  as  follows  for 
the  item  "farm  income'7: 

Total  corrected  sum  of  squares  =  8,183,000 
Sum  of  squares  for  among  counties  within  areas  =  352,000 
Mean  square  for  type  of  farming  areas  ==33, 000. 

(a)   Prepare  and  complete  an  analysis  of  variance  for  "farm  income" 
from  the  above  information. 
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11.27 


(6)    What  is  the  variance  of  a  type-of-farming  area  mean  as  deter 
mined  from  your  analysis  of  variance? 

(c)    Suppose  it  had  been  decided  to  select  only  2  counties  in  each 
area  and  sample  50  farms  in  each  county.  What  is  the  relative 
efficiency  of  the  plan  used  to  the  procedure  suggested  here? 
(<2)   The  type-of-farming  areas  included  in  the  study  were  arbitrarily 
selected  upon  the  basis  of  some  known  differences.  Are  the  areas 
different  with  respect  to  "farm  income?" 
(e)    Write   out   the   expected   values    of   the    mean   squares   used   in 

ans wering  (d)  above. 

Given  the  following  abbreviated  analysis  of  variance  of  the  data  from 
a  completely  randomized  experiment  with  4  treatments,  8  experi 
mental  units  per  treatment,  3  samples  per  experimental  unit,  and  2 
determinations  (of  some  chemical  or  physical  characteristic)  per 
sample : 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Treatments  

3 

19,200 

Among  experimental  units  treated  alike  .... 

28 

4  800 

Among  samples  per  experimental  unit  

64 

2,400 

Hetwe^ri  det^rrni  -nations  pfvr  sample            ,  , 

96 

1,200 

Estimate  the  gain  or  loss  in  efficiency  in  estimating  the  treatment 
effects  if  we  had  used  12  experimental  units  per  treatment,  2  samples 
per  experimental  unit  and  1  determination  per  sample. 

11.28  Describe  the  assumptions  underlying  the  application  of  the  analysis 
of  variance  technique. 

(a)  Which  of  these  assumptions  can  the  research  worker  check  for 
any  particular  analysis? 

(&)  Which  assumption  can  be  fulfilled  by  the  research  worker  in  an 
experimental  situation  by  appropriate  procedures? 

(c)  For  what  purposes  do  we  employ  the  analysis  of  variance  tech 
nique? 

(c£)  What  criterion  should  be  applied  for  judging  the  validity  of  an 
^-ratio  obtained  from  an  analysis  of  variance? 

11.29  (a)   Explain  in  your  own  words  the  meaning  in  the  analysis  of  vari 

ance  of  (1)  a  variance  component  and  (2)  a  fixed  effect? 
(6)    Consider  the  following  abbreviated  analysis  of  variance: 

ANALYSIS  OF  VARIANCE  or  CALORIES  CONSUMED  IN  ONE  DAY  FOR  A 
SAMPLE  OF  IOWA  WOMEN  OVER  THE  AGE  OF  30 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Among  zones  

2 

16,960,000 

8  480  000 

Among  counties  in  zones  .  . 

97 

41    128  000 

424  000 

Between  segments  in  counties  in 
zones  

100 

40  ,  000  ,  000 

400  OOO 

Among  individuals  in  segments 
in  counties  in.  zones  

600 

180   000  000 

300   000 
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For  this  problem  we  shall  assume  that  4  individuals  (women  over 
30)  were  interviewed  in  each  segment  and  that  2  segments  were  se 
lected  at^  random  in  each  county.  The  zones  are  open  country,  rural 
community,  and  urban.  Counties  appearing  in  the  sample  for  the 
zones  were  50,  25,  and  25,  respectively,  yielding  the  97  degrees  of 
freedom  for  counties  in  zones. 

(1)  Estimate  the  variance  components  for  individuals,  segments,  and 
counties  from  this  analysis  of  variance. 

(2)  Test  the  hypothesis:  Calories  consumed  on  this  day  are  the  same 
for  all  zones.  Show  the  mean  squares  used  in  forming  the  ^-ratio, 
and  indicate  in  general  why  these  are  the  proper  mean  squares  to 
use  for  the  test. 

11.30      With  reference  to  Example  11.9  and  Table  11.23,  show  that  the  ex 
pected  mean  squares  for  the  three  comparisons  are: 

Efca  -h 


n4) 


n* 


11.31      With  reference  to  Example  11.11  and  Table  11.25,  show  that  the  ex 
pected  mean  squares  for  the  four  comparisons  are: 


11.32  Given  the  additional  information  that,  in  Problem  11.10,  machine 
A  is  a  standard  machine  and  machines  JS,  (7,  D,  and  E  are  experi 
mental  models,  modify  the  original  analysis  to  assess  the  relative  per 
formance  of  the  five  machines. 

11.33  Apply  the  technique  of  Section  11.10,  to  the  following  problems: 

(«)    11-9  (d)    11.12 

(&)    11.10  (e)    11.13 

(c)    11.11  (/)    11.14 

11.34  If  you  did  not  use  the  technique  of  Section  11.11  in  the  analysis  of 
Problem  11,11,  please  do  so  now. 

11.35  It  is  suspected  that  the  age  of  a  furnace  used  in  curing  silicon  wafers 
influences  the  percentage  of  defective  items  produced.  An  experi 
ment  was  conducted  using  four  different  furnaces  and  the  data  given 
below  were  obtained.  Analyze  and  interpret  the  data. 
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PERCENTAGE  OP  GOOD  WAFERS  IN  8  EXPERIMENTAL  TRIALS 

PER  FURNACE  (THE  SAME  NUMBER  OF  WAFERS  WERE 

USED  IN  EACH  FURNACE  IN  EACH  TRIAL) 


Furnace 

A  (age  1  year) 

B  (age  2  years) 

C  (age  3  years) 

D  (age  4  years) 

95 

95 

80 

70 

92 

85 

80 

65 

92 

92 

82 

70 

90 

83 

78 

72 

92 

83 

77 

72 

94 

88 

75 

66 

92 

89 

78 

50 

91 

90 

78 

66 

11.36  It  is  suspected  that  tlie  environmental  temperature  in  which  batter 
ies  are  activated  affects  their  activated  life.  Thirty  homogeneous 
batteries  were  tested,  6  at  each  of  five  temperatures,  and  the  data 
shown  below  were  obtained.  Analyze  and  interpret  the  data. 

ACTIVATED  LIEE  IN  SECONDS 


Temperature  (°C.) 

0 

25 

50 

75 

100 

55 

60 

70 

72 

65 

55 

61 

72 

72 

66 

57 

60 

73 

72 

60 

54 

60 

68 

70 

64 

54 

60 

77 

68 

65 

56 

60 

77 

69 

65 

11.37  It  is  suspected  that  both  the  machine  on  which  bearings  are  produced 
and  the  operator  of  the  machine  influence  the  critical  dimension, 
namely,  the  inside  diameter.  To  check  on  this,  the  data  given  below 
were  obtained  under  normal  production  conditions.  Analyze  and 
interpret  the  data. 
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Machine 

1 

2 

3 

Operator 

A 

B 

C 

D                      J3 

1.02 

1.03 

1.05 

1.03 

1.02 

1.03 

1.03 

1.06 

1.03 

1.03 

1,02 

1.03 

1.04 

1.02 

1.04 

1.03 

1.06 

1.02 

1.07 

1.02 

1.06 

1.05 

11.38  An  experiment  was  conducted  to  assess  the  effects  of  temperature 
and  humidity  on  the  effective  resistance  of  a  standard  type  of  re 
sistor.  The  following  data  were  obtained.  Analyze  and  interpret  the 
data. 

CODED  RESISTANCE  VALUES 


Temperature 

—  20° 

F. 

70°F. 

160°F. 

Humidity 

10% 

50% 

10% 

50% 

10% 

50% 

23 

24 

26 

24 

25 

27 

24 

24 

25 

25 

26 

26 

25 

25 

26 

26 

26 

28 

24 

26 

26 

26 

28 

28 

11.39      Given  the  following  abbreviated  analysis  of  variance: 


ANALYSIS   OP  VARIANCE  or  NET  INCOME  PER  CROP  ACRE 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Between  soil  areas      

4 

625 

Soil  conservation  programs  .... 
SAXSCP  

3 
12 

400 
225 

Between  farms  in  subclasses  .  .  . 

80 

100 

(a)  Assuming  both  of  the  main  classifications  are  fixed  effects,  indi 
cate  the  appropriate  /^-ratios  for  tests  of  the  hypotheses:  (1)  soil 
areas  do  not  differ  in  income;  (2)  the  soil  conservation  programs 
have  no  effect  on  income  per  crop  acre. 

(&)  Assuming  that  soil  areas  were  selected  at  random  and,  also,  soil 
conservation  programs  were  selected  at  random  from  a  larger 
number  (not  entirely  realistic,  but  possible),  indicate  the  F-ratios 
for  the  tests  listed  in  (a)  above. 
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11.40  In  the  accompanying  table  pasture-acres  per  farm  for  a  sample  of  36 
farms  in  Audubon  County,  Iowa,  for  the  year  1934  are  presented. 
The  sample  consists  of  3  farms  from  each  soil-tenure  grouping;  there 
are  12  such  groups.  It  is  expected  that  there  may  be  some  interaction 
effects  among  the  soil  and  tenure  classes.  We  shall  consider  the  tenure 
grouping  as  a  fixed  effect  and  the  soil  groupings  as  random  sampling 
from  a  larger  population.  Perform  the  necessary  calculations,  set  up 
the  analysis  of  variance  table,  and  discuss  the  results.  Also  examine 
the  homogeneity  of  variance  in  the  tenure  groups  by  the  Bartlett  test. 

DATA  ON  PASTURE  ACREAGE,  AUDUBON  COUNTY,  IOWA,  1934 


Tenure  Group 

Soil  Group 

I 

II 

III 

IV 

Farm 
Owners      1  

37.0 
40.1 
57.0 

36.0 
52.0 
38.0 

72.6 
65,2 
71.0 

(Pasture  acr 
50.0 
28.6 
37.2 

42.0 

54.5 
58.0 

54.0 
58.0 
29.0 

es  per  farm) 
49.0 
43.7 
27.0 

50.9 
34.0 
43.8 

67.4 
32.5 
43.8 

56.0 
69.0 

54.7 

55.0 
41.0 
54.6 

63.0 
45.0 
60.0 

2  

3  

Tenants     1 

2 

3.     . 

Mixed        1  

2  

3  

11.41  Five  varieties  and  4  fertilizers  were  tested.  From  each  experimental 
plot  3  quadrats  were  selected  at  random  and  their  yields  recorded  as 
follows : 


Varieties 


Fertilizers 

1 

2 

3 

4 

5 

57 

26 

39 

23 

48 

1 

46 

38 

39 

36 

35 

28 

20 

43 

18 

48 

67 

44 

57 

74 

61 

2 

72 

68 

61 

47 

60 

66 

64 

61 

69 

75 

95 

92 

91 

98 

78 

3 

90 

89 

82 

85 

89 

89 

99 

98 

85 

95 

92 

96 

98 

99 

99 

4 

88 

95 

93 

90 

98 

99 

99 

98 

98 

99 
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(a)   Construct  an  analysis  of  variance  table. 

(fe)    On  the  basis  of  the  appropriate  model,  write  the  expected  mean 
squares  conforming  to  the  following  assumptions: 

(1)  varieties  and  fertilizers  random  selections; 

(2)  varieties  and  fertilizers  both  given  sets; 

(3)  varieties  a  random  selection — fertilizers  a  given  set. 

(c)  Test  the  hypothesis  of  equal  variety  means.  Test  the  hypothesis 
of  equal  fertilizer  means. 

(d)  Construct  a  table  showing  the  means  and  their  standard  errors. 

(e)  What  conclusions  do  you  reach  as  a  result  of  this  experiment? 

A  building  superintendent  wishes  to  compare  the  relative  perform 
ance  ratings  of  various  combinations  of  floor  wax  and  length  of 
polishing  time.  Three  waxes  are  to  be  investigated  along  with  3 
polishing  times.  Eighteen  homogeneous  floor  areas  are  selected  and 
2  are  assigned  at  random  to  each  of  the  9  treatment  combinations. 
Analyze  and  evaluate  the  following  data. 


PERFORMANCE  RATINGS 
(HIGH  is  BETTER  THAN  Low) 


Wax 

A 

B 

C 

Polishing  time 
(in  minutes) 

15 

30 

45 

15 

30 

45 

15 

30 

45 

7 

7.5 

8.2 

7 

7.2 

7.1 

8 

9.2 

9.6 

8 

7.4 

8.6 

7 

7.6 

7 

8 

9.4 

9.5 

11.43  An  experiment  was  performed  to  assess  the  effects  of  type  of  material 
and  heat  treatment  on  the  abrasive  wear  of  bearings.  Two  bearings 
were  tested  at  each  of  10  treatment  combinations.  Analyze  and 
interpret  the  following  data. 


AMOUNT  OF  WEAR  (CODED  DATA) 


Material 

A 

B 

C 

D 

R 

Heat 

treatment* 

O        M 

O        M 

O        M 

O        M 

O        M 

23        30 
25        31 

42        45 
44        50 

37        39 

38        39 

41        44 

42        49 

20        24 
25        30 

*  0  =  oven  dried;  M— moisture  saturated. 

11.44  From  each  of  5  lots  of  insulating  material,  10  lengthwise  specimens 
and  10  crosswise  specimens  are  cut.  The  following  table  gives  the 
impact  strength  in  foot-pounds  from  tests  on  the  specimens.  Analyze 
and  interpret  the  data. 
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Lot  Number 

Type  of  Cut 

I 

II 

III 

IV 

V 

1.15 

1.16 

0.79 

0.96 

0.49 

0.84 

0.85 

0.68 

0.82 

0.61 

0.88 

1.00 

0.64 

0.98 

0.59 

0.91 

1.08 

0.72 

0.93 

0.51 

Lengthwise 

0.86 

0.80 

0.63 

0.81 

0.53 

specimens 

0.88 

1.01 

0.59 

0.79 

0.72 

0.92 

1.14 

0.81 

0.79 

0.67 

0.87 

0.87 

0.65 

0.86 

0.47 

0.93 

0.97 

0.64 

0.84 

0.44 

0.95 

1.09 

0.75 

0.92 

0.48 

0.89 

0.86 

0.52 

0.86 

0.52 

0.69 

1.17 

0.52 

1.06 

0.53 

0.46 

1.18 

0.80 

0.81 

0.47 

0.85 

1.32 

0.64 

0.97 

0.47 

Crosswise 

0.73 

1,03 

0.63 

0.90 

0.57 

specimens 

0.67 

0.84 

0.58 

0.93 

0.54 

0.78 

0.89 

0.65 

0.87 

0.56 

0.77 

0.84 

0.60 

0.88 

0.55 

0.80 

1.03 

0.71 

0.89 

0.45 

0.79 

1.06 

0.59 

0.82 

0.60 

1 1 .45  Five  batches  of  ground  meat  are  charged  consecutively  into  a  rotary 
filling  machine  for  packing  into  cans.  The  machine  has  6  filling 
cylinders.  Three  filled  cans  are  taken  from  each  cylinder  at  random 
while  each  batch  is  being  run.  The  coded  weights  of  the  filled  cans 
are  given  below.  Analyze  and  interpret  the  data. 
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Cylinder 

Batch 

1 

2 

3 

4 

5 

1 

1 

4 

6 

3 

1 

1 

3 

3 

1 

3 

2 

5 

7 

3 

3 

2 

—  1 

—  2 

3 

2 

1 

3 

1 

1 

0 

0 

—  1 

0 

5 

1 

1 

3 

1 

2 

2 

1 

3 

1 

0 

4 

3 

3 

1 

1 

3 

3 

3 

4 

—  2 

—  2 

3 

0 

0 

3 

0 

3 

0 

1 

0 

1 

4 

2 

1 

5 

1 

2 

0 

1 

—  2 

1 

1 

1 

0 

3 

•* 

5 

2 

_____  i 

1 

6 

0 

0 

3 

3 

3 

1 

0 

3 

0 

1 

1 

3 

4 

2 

2 

11.46  The  following  data  on  the  density  of  small  bricks  resulted  from  an 
experiment  involving  3  different  sizes  of  powder  particles,  3  pres 
sures,  and  3  temperatures  of  firing.  The  27  combinations  of  this 
3X3X3  factorial  were  run  in  duplicate.  Analyze  and  interpret  the 
the  following  coded  data. 


Size 

Pressure 

Temperature 

1900 

2000 

2300 

5-10 

5.0 
12.5 
20.0 

340  375 
388  370 
378  378 

316  386 
338  214 
348  378 

374  350 
334  366 
380  398 

10-15 

5.0 
12.5 
20.0 

260  244 
322   342 
330  298 

388  304 
300  420 
260  366 

266  234 
234  258 
350  284 

15-20 

5.0 
12,5 
20.0 

134  140 
186   30 
40  210 

146   194 
412  428 
436  490 

152  212 
194  208 
230  254 
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11.47  During  the  manufacture  of  sheets  of  building  material  the  perme 
ability  was  determined  for  3  sheets  from  each  of  3  machines  on  each 
day.  The  table  below  gives  the  logarithms  of  the  permeability  in 
seconds  for  sheets  selected  from  the  3  machines  during  a  production 
period  of  9  days.  The  3  machines  received  their  raw  materials  from  a 
common  store.  Analyze  and  interpret  the  data. 


Day 

Machine 

Log  of  Permeability 

1 

1 
2 
3 

1,404 
1.306 
1.932 

1.346 
1.628 
1.674 

1.618 
1.410 
1.399 

2 

1 
2 
3 

1.447 
1.241 
1.426 

1.569 
1.185 
1.768 

1.820 
1.516 
1.859 

3 

1 
2 
3 

1.914 
1.506 
1.382 

1.477 
1.575 
1.690 

1.894 
1.649 
1.361 

4 

1 
2 
3 

1.887 
1.673 
1.721 

1.485 
1.372 
1.528 

1,392 
1.114 
1.371 

5 

1 
2 
3 

1.772 
1.227 
1.320 

1.728 
1.397 
1.489 

1.545 
1.531 
1.336 

6 

1 
2 

3 

1.665 
1.404 
1.633 

1.539 
1.452 
1.612 

1.690 
1.627 
1.359 

7 

1 
2 

3 

1.918 
1.229 
1.328 

1.931 
1.508 
1.802 

2.129 
1.436 
1.385 

8 

1 
2 

3 

1.845 
1.583 
1.689 

1.790 
1.627 
2.248 

2.042 
1.282 
1.795 

9 

1 

2 

3 

1.540 
1.636 
1.703 

1.428 
1.067 
1.370 

1.704 
1.384 
1.839 
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CH  APTE  R    12 

RANDOMIZED  COMPLETE  BLOCK 
DESIGN 

IN  THIS  CHAPTER,  the  most  widely  used  of  all  experimental  designs,  the 
randomized  complete  block  design,  will  be  discussed.  The  discussion 
will  follow  closely  the  pattern  adopted  in  Chapter  11  with,,  once  again, 
the  greatest  attention  being  given  to  methods  of  analysis. 

12.1      DEFINITION  OF  A  RANDOMIZED  COMPLETE  BLOCK 
DESIGN 

A  randomized  complete  block  (RCB}  design  is  a  design  in  which:  (1)  the 
experimental  units  are  allocated  to  groups,  or  blocks,  in  such  a  way 
that  the  experimental  units  within  a  block  are  relatively  homogeneous 
and  that  the  number  of  experimental  units  within  a  block  is  equal  to 
the  number  of  treatments  being  investigated,  and  (2)  the  treatments 
are  assigned  at  random  to  the  experimental  units  within  each  block.  In 
the  foregoing,  the  formation  of  the  blocks  reflects  the  researcher's 
judgment  as  to  potential  differential  responses  from  the  various  experi 
mental  units  while  the  randomization  procedure  acts  as  a  justification 
for  the  assumption  of  independence.  (See  Chapters  10  and  11.) 

Example  12.1 

Six  varieties  of  oats  are  to  be  compared  with  reference  to  their  yields, 
and  30  experimental  plots  are  available  for  experimentation.  However, 
evidence  is  on  file  which  indicates  a  fertility  trend  running  from  north 
to  south,  the  northernmost  plots  of  ground  being  the  most  fertile.  Thus, 
it  seems  reasonable  to  group  the  plots  into  five  blocks  of  six  plots  each 
so  that  one  block  contains  the  most  fertile  plots,  the  next  block  contains 
the  next  most  fertile  group  of  plots,  and  so  on  down  to  the  fifth  (south 
ernmost)  block  which  contains  the  least  fertile  plots.  The  six  varieties 
would  then  be  assigned  at  random  to  the  plots  within  each  block,  a  new 
randomization  being  made  in  each  block. 

Example  12.2 

An  experiment  is  to  be  designed  to  study  the  effect  of  environmental 
temperature  on  the  transfer  time  of  a  certain  type  of  electrical  gap. 
Twelve  different  temperatures  are  to  be  investigated.  A  check  of  the 
stockroom  indicates  that  gaps  are  available  from  six  different  production 
lots.  Since  it  has  previously  been  established  that  gaps  from  different 
lots  exhibit  different  characteristics,  even  when  subjected  to  the  same 
conditions,  some  blocking  is  desirable.  Accordingly,  12  gaps  are  selected 
at  random  from  each  of  the  six  production  lots,  and  each  such  set  of  12 
gaps  is  hereafter  referred  to  as  a  block.  Then  the  12  temperatures  are 
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assigned  at  random  to  the  gaps  within  each  block,  a  new  randomization 
being  made  in  each  block. 

Example  12.3 

Ten  rations  are  to  be  tested  for  differences  in  producing  a  gain  in 
weight  for  steers.  Forty  steers  are  available  for  experimentation  and 
they  are  allocated  to  four  blocks  (10  steers  per  block)  on  the  basis  of 
their  weights  at  the  beginning  of  the  feeding  trial,  with  the  heaviest 
steers  being  in  one  block,  the  next  heaviest  steers  being  in  the  second 
block,  and  so  on.  The  10  treatments  (rations)  were  assigned  at  random 
to  the  steers  within  each  block,  as  shown  in  Figure  12.1. 


Block  1 

H 

B 

F 

A 

C 

I 

E 

J 

D 

G 

Block  2 

A 

I 

G 

H 

J 

D 

F 

E 

C 

B 

Block  3 

E 

A 

C 

I 

B 

H 

D 

G 

J 

F 

Block  4 

J 

F 

D 

B 

H 

I 

A 

C 

G 

E 

FIG.  12.1  —  Random  arrangement  of  treatments  as 
described    in    Example    12.3. 


12.2      RANDOMIZED    COMPLETE    BLOCK    DESIGN    WITH 
ONE  OBSERVATION  PER  EXPERIMENTAL  UNIT 

The  basic  assumption  for  a  randomized  complete  block  design  with 
one  observation  per  experimental  unit  is  that  the  observations  may  be 
represented  by  the  linear  statistical  model 


i  =  1,  -  -  - ,  b 
j  -  1,  -  -  -  ,t 


(12.1) 


where  p,  is  the  true  mean  effect,  f3g  is  the  true  effect  of  the  ith.  block, 
TJ  is  the  true  effect  of  the  jth  treatment,  and  e^  is  the  true  effect  of 
the  experimental  unit  in  the  ith  block  which  is  subjected  to  the  y 
treatment.  In  addition, 


=  0   and  e#  is  NID(0,  a). 


As  in  Chapter  11,  either  Model  I  or  Model  II  may  be  assumed  with  re 
spect  to  the  TJ. 

Using  the  symbolism  of  Table  12.1  and  the  following  equations: 


F2  =  total  sum  of  squares 
&       t 

-z 


(12.2) 


ONE   OBSERVATION    PER    EXPERIMENTAL    UNIT 


365 


TABLE   12.1-Synabolic  Representation  of  the  Data  in  a  Randomized 
Complete  Block  Design  With  One  Observation  per  Experimental  Unit 


Treatment 

Block 

1  ...              j  .  .  .                t 

Total 

Mean 

1.,  

Fii                     Fi  -                     Fi 

B 

•y 

i  .  ,  .     ... 

F-                 y  ••                v  - 

*j  i 

fj'h 

3 

Total 
Mean 

TI                     TJ                     Tt 
Y.i                  Y.s                  Y.t 

T 

F.. 

TABLE   12.2-Generalized  ANOVA  for  a  Randomized  Complete  Block 
Design  With  One  Observation  per  Experimental  Unit:  Model  I 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean 

Square 

IVtean 

1 

J\ft/*t 

M 

Blocks     ...         

b  —  1 

K-.-r 

B 

o-2  4-  1  2D  /3i/0 

>-D 

Treatments  

t  —  1 

T 

i—  1 

• 

Experimental  error  

(6—  !)(/!  —  1) 

&VV 

E 

/-i 
o-2 

Total 

to 

T:F* 

=sum  of  squares  due  to  the  mean 


=  among  blocks  sum  of  squares 


=  among  treatments  sum  of  squares 


S 

y— i 


(12.3) 
(12-4) 

(12.5) 
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and 

=  experimental  error  sum  of  squares 

H 


(12.6) 
-  Byy  -  Tyy, 

the  ANOVA  shown  in  Table  12.2  is  obtained. 

Following  the  line  of  reasoning  developed  in  Section  11.2,  it  is  easily 
verified  that  the  hypothesis  H:rj  =  Q  C?  =  l,  -  -  •  ?  0  may  be  tested  by 
computing 

mean  square  for  treatments 

(12,7) 


experimental  error  mean  square 

which,  if  jffistrue,  is  distributed  as  Fwithvi  =  t  —  1  and  *>2=  (6  —  !)(£—  1) 
degrees  of  freedom.  If  the  value  of  F  specified  by  Equation  (12.7)  ex 
ceeds  j^ci-cooi,  *2)>  where  lOOa  per  cent  is  the  chosen  significance  level, 
H  will  be  rejected  and  the  conclusion  reached  that  there  are  significant 
differences  among  the  t  treatments. 

As  before,  it  is  also  possible  to  estimate  cr2  by  s2  —  E.  Then,  too,  if 
Model  II  had  been  assumed,  o>  would  be  estimated  by 

sr  =  (T  -  E)/b.  (12.8) 

In  either  case,  that  is.  Model  I  or  Model  II,  the  estimated  variance  of 
a  treatment  mean  is  given  by 

F(F.y)  =  E/b  (12.9) 

and  the  standard  error  of  a  treatment  mean  is  given  by 


Vs2/b.  (12.10) 

A  IQOy  per  cent  confidence  interval  for  estimating  M/  —  M  +  T/  is  then 
found  by  calculating 

=   Y.J  T  *[ci+-y)/*iooVS7&  (12.11) 

where  ^=(&  —  !)(*—  1)- 

Example  12.4 

The  experiment  described  in  Example  12.3  was  performed  and  the 
data  of  Table  12.3  were  obtained.  Use  of  Equations  (12.2)  through 
(12.6)  yields  the  ANOVA  shown  in  Table  12.4.  Because  the  F-ratio  is 
significant,  we  reject  HIT,  =0(./  =  1,  •  •  •  ,  10)  and  decide  that  in  all 
likelihood  the  10  treatments  (rations)  are  not  equally  effective  in  pro 
ducing  a  gain  in  weight  on  steers.  The  treatment  means  and  their 
standard  error,  given  in  Table  12.3  for  convenience,  may  then  be  used  to 
determine  the  best  treatment  (or  treatments)  and  to  indicate  the 
direction  which  future  research  should  take. 
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TABLE   12.3-Gains  in  Weight  (in  Lbs,)  of  Forty  Steers  Fed  Different  Rations 

(Data  coded  for  easy  calculation) 


TV**a  t 

Block 

ment 

1 

2 

3 

4 

Total 

Mean 

A    

2 

3 

3 

5 

13 

3,25 

B           .    . 

5 

4 

5 

5 

19 

4.25 

C  

8 

7 

10 

9 

34 

8.50 

D  

6 

5 

5 

2 

18 

4.50 

E,  

1 

2 

1 

2 

6 

1.50 

F    

3 

5 

7 

8 

23 

5.75 

G           .    . 

8 

8 

7 

8 

31 

7.75 

H 

6 

12 

2 

5 

25 

6.25 

I 

4 

5 

6 

3 

18 

4.50 

J  

4 

4 

2 

3 

13 

3.25 

Total 

47 

55 

48 

50 

200 

Mean 

4  7 

5.5 

4.8 

5.0 

5.0 

Standard  error  of  a  treatment  mean  =-\/(3 -43) /4  =  0.93 

TABLE   12.4-ANOVA  for  Experiment  Described  in  Example  12.3  and 
Discussed  in  Example  12.4  (Data  in  Table  12.3) 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean 
Square 

F-Ratio 

IVTean 

1 

10OO  0 

1000  00 

Blocks 

3 

3  8 

1  26 

4 

0-24.  (10/3)  Y\  Bi 

Treatments  

9 

163.5 

18.17 

i—  a 
O-S-l-^/Q)    ]JTT* 

5.29** 

KxTD^riimpTi'tfil  error 

27 

92  7 

3  43 

J-—1 
<r2 

Total 

40 

1260  0 

**  Significant  at  a =0.01. 

Before  moving  along  to  the  next  topic  connected  with  the  analysis 
of  randomized  complete  block  designs,  there  is  one  point  that  needs 
discussion.  That  is,  why  do  we  not  test  £P:£i  =  0  (i=l,  •  -  •  ,  6)? 
Examination  of  Table  12.2  will  show  that  the  expected  mean  square 
for  blocks  is  of  the  same  form  as  the  expected  mean  square  for  treat 
ments,  and  this  suggests  that  a  logical  procedure  would  be  to  test  Hf 
by  calculating  F  =  B/E.  Why  is  it,  then,  that  the  statistician  says  this 
should  not  be  done?  The  answer  may  be  found  by  noting  the  manner 
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in  which  the  randomization  was  performed.  You  will  recall  that  the 
treatments  'were  assigned  at  random  to  the  experimental  units  within 
each  block  but  that  the  blocks  were  formed  in  a  decidedly  nonrandom 
fashion.  Because  of  this  feature  of  the  randomized  complete  block  de 
sign,  a  statistical  test  of  the  block  effect  should  not  be  performed.  [NOTE: 
In  some  cases  where  "blocks"  are  replaced  by  "replications"  and  where 
the  replications  may  be  considered  as  random  samples  of  all  possible 
replications  (i.e..  Model  II  with  respect  to  replications),  an  ^P-test  for 
replications  may  be  appropriate.  However,  even  in  such  a  case,  the 
jF-test  would  be  of  less  importance  than  the  estimation  of  the  com 
ponent  of  variance  for  replications,  o>  ] 

12.3  THE    RELATION    BETWEEN    A    RANDOMIZED    COM 
PLETE   BLOCK    DESIGN  AND    "STUDENT'S"    *-TEST 
OF    jHr:Mz>==0     WHEN     PAIRED    OBSERVATIONS  ARE 
AVAILABLE 

In  Section  7.9,  procedures  were  given  for  testing  the  hypothesis 
H:fj,i  =  fj.2  for  three  different  cases.  In  Section  11.3  it  was  stated  that 
the  analysis  of  a  completely  randomized  design  was  equivalent  to  one 
of  these,  namely,  Case  I.  In  this  section  we  wish  to  show  the  equiv 
alence  of  the  analysis  of  a  randomized  complete  block  design  with  two 
blocks  and  "Student V  t-test  of  If'.jAi  —  juiz  when  paired  observations 
are  available,  that  is,  to  Case  III.  The  crucial  step  is  to  note  the  equiv 
alence  of  "pairs"  and  "blocks."  Once  this  association  of  terms  is  made, 
the  equivalence  of  the  techniques  may  easily  be  demonstrated.  Rather 
than  burden  the  reader  with  the  details  of  the  algebraic  proof  of  the 
equivalence,  we  will  rely  on  the  "power  of  an  example"  to  convince  him 
of  the  truth  of  our  claim.  (NOTE :  The  reader  should  also  reflect  on  the 
obvious  connection  between  the  material  of  this  section  and  the  con 
tents  of  Section  9.13.) 

Example  12.5 

Consider  again  the  experiment  described  in  Example  7.21  and  the 
data  presented  in  Table  7.6.  Denoting  pairs  (samples)  by  blocks  and 
utilizing  Equations  (12.2)  through  (12.6),  the  ANOVA  of  Table  12.5  is 
obtained.  It  is  seen  that  the  F- value  is  significant  at  a.  =  0.05,  and  this 
permits  us  to  reject  the  hypothesis  that  the  two  treatments  (i.e.,  two 
different  steel  balls)  are  doing  an  equivalent  job.  [NOTE:  ^  =  7.89 
=  £2  =  (2.81)2;  see  Example  7.21.]  It  may  be  verified  that  the  two  treat 
ment  means  are  54.2  and  46.2,  respectively.  Also,  the  standard  error  of 
a  treatment  mean  is  determined  to  be  V60. 8/15  =  2.01. 

12.4  SUBSAMPLING      IN     A      RANDOMIZED     COMPLETE 
BLOCK   DESIGN 

When  subsampling  is  employed  in  a  randomized  complete  block  de 
sign,  the  appropriate  statistical  model  is 
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TABLE   12.5-ANOVA  for  Experiment  Described  in  Examples  7.21  and 

12.5  (Data  in  Table  7.6) 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean  Square 

jF-Ratio 

Mean  

1 

75  601  2 

75  601  2 

Blocks 

14 

2,649  8 

189  3 

15 

<rM-(2/14}  Y^  8* 

Treatments  

1 

480  0 

480.0 

i-i 

2 

cr2H-(15/l)  T"  Tf 

7  89* 

Exp^T~i™~»^nt2»l  e.rrnr     , 

14 

851,0 

60.8 

J-l 

<r2 

Total 

30 

79,582.0 

*  Significant  at  «=O.05. 

J  =   1,   '   *   '  >  t 

k  =  1,  •  -  -  ,  n 

and  the  terms  are  defined  in  the  usual  manner.  The  various  sums  of 
squares  are  found  as  follows : 

]YZ  =  total  sum  of  squares 

,  (12.13) 


=  sum  of  squares  due  to  the  mean 

H  (12.14) 

=  T^/btn^ 

=  among  cells  sum  of  squares  for  the  block  X  treatment  table 
&       t 

XT^  x~^  T   /M       K/T  M9  1 1^^ 

=     /  ;    /  ^   J.  ij/n  —   M-yyi  {LZ.  L3} 

=  sampling  error  sum  of  squares 


B      = 


yy 


block  sum  of  squares 
—  Mvy, 


(12.16) 
(12.17) 


=  treatment  sum  of  squares 


Tj/bn  — 


(12.18) 


37O 

and 
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Eyy  =  experimental  error  sum  of  squares 

===    £>bt  -L$ity  J-  yy 

where 

T  =  grand  total 

b          t         n 

-  z:  z  2:  r«* 

t^l  y^i  &=! 

7\7  =  total  of  all  observations  in  the  iih  block  that  were 
subjected  to  thejth  treatment 


(12.19) 


(12.20) 


(12.21) 


?i  =  total  of  all  observations  in  the  iih  block 

t         n  t 


(12.22) 


and 


TV  ==  total  of  all  observations  subjected  to  the^th  treatment 


==       S   -*       2  _^r       *     ijjc     ==       S     *       J-    tj - 


(12.23) 


Using  the   preceding   results,   the   ANOVA   shown   in   Table    12.6   is 
obtained. 

TABLE  12.6-Generalized  ANOVA  for  a  Randomized  Complete  Block 
Design  With  n  Samples  per  Experimental  Unit:  Model  I 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean  Square 

F-Ratio 

M!ean  

1 

Ay.  tii  t 

M 

Blocks  

6—1 

B 

2                                  ^A       2 
0.   _I_**0.2_L./**     >  "   Q. 

Treatments  

2—1 

T 

T 

i—  1 
t 

0^^.no.^^rf)n  y^  X2 

T/E 

Experim  en  tal  error  .  . 

(£,—  !)(£_!) 

J—tyy 

E 

/—  i 

a2_|_         2 

Sampling  error  

bt(n—  1) 

C- 

s 

V   ] 

o-2 

-n 

Total 

Un 

y^r2 
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Example  12.6 

An  experiment  was  performed  to  assess  the  relative  effects  of  five 
fertilizers  on  the  yield  of  a  certain  variety  of  oats.  The  location  of  the 
30  experimental  plots  available  for  use  in  the  experimentation  was  such 
that  it  seemed  advisable  to  group  the  plots  into  six  blocks  of  five  plots 
each.  The  treatments  were  then  randomly  assigned  to  the  plots  within 
each  block.  At  the  end  of  the  growing  season  the  researcher  decided  to 
harvest  (for  purposes  of  analysis)  only  three  sample  quadrats  from 
each  plot.  The  data  of  Table  12.7  were  obtained  and  these  led  to  the 
ANOVA  shown  in  Table  12.8.  It  is  noted  that  the  five  fertilizer  means 
are  significantly  different  and,  thus,  it  is  most  important  that  the 
proper  fertilizer  be  recommended  to  the  farmer.  A  tabulation  of  the 
fertilizer  means,  together  with  the  appropriate  standard  error,  would 
be  of  great  help  in  reaching  the  correct  decision. 

TABLE  12.7-Coded  Values  of  Yields  From  Ninety  Sample  Quadrats 


Blocks 

Fertilizer  Treatments 

1 

2 

3 

4 

5 

1  

57 

67 

95 

102 

123 

46 

72 

90 

88 

101 

28 

66 

89 

109 

113 

2  

26 

44 

92 

96 

93 

38 

68 

89 

89 

110 

20 

64 

106 

106 

115 

3  

39 

57 

91 

102 

112 

39 

61 

82 

93 

104 

43 

61 

98 

98 

112 

4  

23 

74 

105 

103 

120 

36 

47 

85 

90 

101 

18 

69 

85 

105 

111 

5  

48 

61 

78 

99 

113 

35 

60 

89 

87 

109 

48 

75 

95 

113 

111 

6   

50 

68 

85 

117 

124 

37 

65 

74 

93 

102 

19 

61 

80 

107 

118 

12.5  PRELIMINARY  TESTS  OF  SIGNIFICANCE 

At  this  time  I  wish  to  digress  for  a  few  moments  from  the  pattern 
established  in  Chapter  11,  and  followed  thus  far  in  the  present  chapter, 
to  discuss  a  matter  of  considerable  importance.  This  topic,  namely, 
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TABLE   12.8-ANOVA  for  Data  of  Table  12.7;  Discussion  in  Example  12.6 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean  Square 

F-Ratio 

Mean  

1 

573,921.88 

573,921.88 

Blocks  

5 

354.19 

70.84 

JU        2 

<r2-f-3<7-2  -K15/5)  J^Qi 

KeT'tilf  ^ers  .  .  , 

4 

65  246  84 

16311  71 

s 

O^+ScrM-ClS/i)   y~\  Tj 

220  61** 

Experimental 
error  

20 

1,478.76 

73.94 

*                                               3-1 

o-2-i-3<r2 

Sampling  error. 

60 

5,283  33 

88  06 

"17   i^ 

a* 

Grj 

Total 

90 

646,285.00 

Significant  at  ex =0.01. 


the  use  of  preliminary  tests  of  significance,  could  just  as  easily  have 
been  discussed  in  Chapter  11  or  it  could,  without  difficulty,  be  deferred 
until  later.  However.,  Example  12.6  at  the  end  of  Section  12.4  brought 
it  to  my  attention,  and  thus  we  shall  consider  it  at  this  point. 

Some  practitioners  suggest  that  when,  as  in  Example  12.6,  the  experi 
mental  error  mean  square  is  less  than  the  sampling  error  mean  square, 
the  two  sums  of  square  and  their  degrees  of  freedom  be  pooled.  (The 
same  suggestion;  i.e.,  to  pool,  is  also  frequently  made  when  the  experi 
mental  error  mean  square  exceeds,  but  not  significantly,  the  sampling 
error  mean  square.)  That  is,  a  pooled  sum  of  squares  (Ew+Sw)  is 
divided  by  the  pooled  degrees  of  freedom  [(&  —  1)  (2  —  l)+&^(n  —  1)],  and 
this  new  mean  square  is  then  used  as  the  denominator  in  the  .F-ratio 
for  testing  H:r3-  =  Q  (y=  1,  •  -  -  ,  t).  If  such  a  procedure  is  followed,  the 
statistical  test  of  jETrry  —  O  (j—1,  *•*,£)  will  be  based  on  a  preceding, 
or  preliminary,  test  of  significance,  the  hypothesis  H':<r2  =  Q  being 
tested  by  the  preliminary  test.  Because  such  procedures  are  sometimes 
followed,  we  must  make  certain  that  we  understand  their  advantages 
and  disadvantages. 

Problems  of  the  above  type  have  been  investigated  by  Paull  (10), 
and  it  will  pay  us  to  spend  a  few  moments  reviewing  his  conclusions 
and  recommendations.  To  make  the  exposition  easier  to  follow,  we 
shall  tie  it  in  with  Table  12.6  .  Suppose  the  experimenter  decides  he  will 
always  pool  the  two  mean  squares  as  indicated  in  the  preceding  para 
graph,  that  is,  he  will  never  perform  a  preliminary  test  of  significance 
concerning  H':<r2  =  Q.  If,  in  fact,  <r2  does  equal  0,  this  procedure  is  fine. 
But  suppose  a-2  >  0 ;  then  the  denominator  in  the  final  F-test  (of  HiTj  =  0 
for  allj)  tends  to  be  too  small.  Thus,  in  such  a  situation,  the  final  F-test 
tends  to  produce  too  many  significant  results  when  the  mill  hypothesis 
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H  is  really  true.  This  is  bad,  for  it  implies  that,  quoting  Paull,  "a  test 
which,  the  research  worker  thinks  is  being  made  at  the  5  per  cent  level 
might  actually  be  at,  say,  the  47  per  cent  level."1 

The  use  of  a  preliminary  test  of  significance  is  clearly  an  attempt  to 
guard  against  such  a  possibility.  It  will  not,  of  course,  eliminate  such 
occurrences  entirely.  To  be  useful,  however,  it  should  keep  the  actual 
(effective)  significance  level  achieved  by  the  final  (or  dependent)  F-test 
close  to  the  value  at  which,  the  research  worker  desires  to  operate. 
Another  property  which  should  be  required  of  a  preliminary  test  is  that 
it  increase  the  power  of  the  final  -F-test  relative  to  the  power  of  a 
"never  pool"  test.  The  recommendations  for  the  performance  of  pre 
liminary  tests  of  significance  for  pooling  mean  squares  in  the  analysis 
of  variance  as  formulated  by  Paull  may  be  found  in  the  reference 
quoted.  However,  if  the  research  worker  follows  the  rule  of  "never 
pooling,"  he  will  not  go  far  wrong,  and  that  is  the  rule  we  shall  adopt 
in  this  text. 

12.6      ESTIMATION  OF  COMPONENTS  OF  VARIANCE  AND 
RELATIVE   EFFICIENCY 

The  problems  of  estimating  components  of  variance  and  predicting 
relative  efficiency  are  no  different  in  a  randomized  complete  block 
design  than  they  were  in  a  completely  randomized  design.  Thus,  they 
will  not  detain  us  long. 

As  before,  <r%  is  estimated  by  s*  =  S  (see  Table  12.6)  and  <r2  is 
estimated  by 

s*  =   (E  -  S)/n.  (12.24) 

If,  as  in  Table  12.8,  the  algebraic  solution  leads  to  a  negative  value  of 
s2,  it  is  customary  to  disregard  the  algebraic  solution  and  use  0  as  the 
estimate.  Of  course,  this  is  a  biased  procedure,  but  it  is  aesthetically 
more  satisfying  since  no  population  variance  can  be  negative. 

The  relative  efficiencies  of  various  allocations  -within  a  randomized 
complete  block  design  will,  as  was  the  case  when  discussing  a  completely 
randomized  design,  be  determined  by  studying  the  variance  of  a  treat 
ment  mean.  It  may  be  shown  that 

^  _  experimental  error  mean  square 


number  of  observations  per  treatment  (12.25) 

=  E/bn  =  (4  +  ns*)/bn. 

Thus,  if  the  estimates  of  the  components  of  variance  remain  unchanged, 
the  efficiency  of  this  design  relative  to  one  in  which  we  might  use  &' 
blocks  and  n'  samples  per  experimental  unit  would  be  predicted  by 

R.R.  of  old  to  new  -  100  [F'(F.yJ/F(F.,.)]  per  cent       (12.26) 

1  A.  E.  Paull,  "On  a  preliminary  test  for  pooling  mean  squares  in  the  analysis  of 
variance,"  Ann.  Math.  Stat.,  Vol.  21,  1950,  p.  541. 


374 

where 


CHAPTER    12,    RANDOMIZED    COMPLETE   BLOCK    DESIGN 


(12.27) 


Example  12.7 

^  Consider  Table  12.9.  It  is  easily  seen  that  s%  =  10  and  s2  =  12,  yielding 
V(Y .j,}  =58/24.  To  determine  the  efficiency  of  the  design  used  relative 
to  one  involving  four  blocks  and  six  samples  per  experimental  unit, 
we  first  calculate  F'(F.y.)  =  [10  +  6(12)  ]/4(6)  =82/24.  Thus,  the 
estimated  relative  efficiency  is  100(82/24)/(58/24)  =  141  per  cent. 

TABLE   12. 9- Abbreviated  ANOVA  on  Yields  of  Ten  Varieties  of  Soybeans 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean 

Square 

Blocks  

5 

3,000 

600 

cr2-Mt<r2+(40/5) 

±rf 

Varieties  

9 

4,500 

500 

o-2+4o-2-K24/4) 

"F  TZ- 

Experimental  error.  .  .  , 
Sampling  error  

45 
180 

2,610 
1,800 

58 
10 

£ 

" 

v 

The  ideas  of  this  section  may  easily  be  extended  to  cases  involving 
many  stages  of  subsampling.  No  examples  will  be  given,  but  several 
of  the  problems  at  the  end  of  the  chapter  will  provide  the  necessary 
practice  in  the  manipulations. 

12.7  EFFICIENCY  OF  A  RANDOMIZED  COMPLETE 
BLOCK  DESIGN  RELATIVE  TO  A  COMPLETELY 
RANDOMIZED  DESIGN 

In  some  instances,  the  investigator  wishes  to  estimate  the  efficiency 
of  his  use  of  a  randomized  complete  block  design  relative  to  what 
might  have  happened  if  the  treatments  had  been  completely  random 
ized  over  all  the  experimental  units.  That  is,  he  wishes  to  know  if  he 
gained  or  lost  in  efficiency  by  grouping  the  experimental  units  into 
homogeneous  groups  (blocks)  .  One  method  of  comparing  the  efficiency 
of  different  designs  is  by  use  of  uniformity  data.  Cochran  (4)  has  dis 
cussed  this  particular  approach,  and  the  reader  is  referred  to  his  article 
for  further  details.  A  second  method  of  comparing  efficiencies  is  to 
consider  algebraically  what  might  have  happened  to  the  experimental 
error  mean  square  under  complete  randomization.  To  accomplish  this, 
it  is  convenient  to  proceed  as  though  dummy  treatments  had  been 
applied  to  the  experimental  units.  That  is,  we  suppose  that  all  experi 
mental  units  were  subjected  to  the  same  (viz.,  no)  treatment  and  then 
proceed  to  estimate  what  the  experimental  error  mean  square  would 
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have  been  under  complete  randomization.  Following  this  line  of  reason 
ing,  and  defining  the  efficiency  of  a  randomized  complete  block  design 
relative  to  a  completely  randomized  design  by 


estimated  experimental  error  mean  square  for  a  CR  design 
experimental  error  mean  square  from  the  RCB  design 
it  can  be  shown  that 

(b  —  1)J5  +  b(t  — 


R.E.  = 


(bt  — 


(12.28) 


(12.29) 


where  JB  and  E  refer  to  the  mean  squares  (in  the  randomized  complete 
block  design)  for  blocks  and  experimental  error,  respectively. 

Example  12.8 

Consider  the  ANOVA  presented  in  Table  12.4.  In  this  case,  the  effi 
ciency  of  the  randomized  complete  block  design  relative  to  a  completely 
randomized  design  is  estimated  to  be 


R.E. 


3(1.26)  +4(9)  (3.43) 
39(3.43) 


0.95. 


It  is  seen  that,  because  of  the  small  magnitude  of  B  relative  to  E,  no 
appreciable  gain  in  efficiency  resulted  from  the  formation  of  the  blocks 
and  the  use  of  a  randomized  complete  block  analysis.  That  is,  apart 
from  the  "insurance"  feature  of  the  RCB  design,  the  added  effort  was 
not  worthwhile. 

Example  12.9 

The  data  in  Table  12.10  resulted  from  a  particular  manufacturing 
operation,  the  operation  being  performed  by  one  of  four  different  ma 
chines.  The  data  were  collected  on  five  different  days,  hereafter  referred 
to  as  blocks.  Calculations  yielded  the  abbreviated  ANOVA  shown  in 
Table  12.11.  Proceeding  according  to  Equation  (12.29),  the  randomized 
complete  block  design  is  estimated  to  be  131  per  cent  as  efficient  as  a 
completely  randomized  design  would  have  been. 

TABLE   12,10-Output  From  Four  Machines  Producing  Part  No.  Z-15 
(Output  =  number  of  units  produced  In  one  day) 


Machine 


Day 

A 

B 

C 

D 

1  

293 

308 

323 

333 

2   

298 

353 

343 

363 

3  

280 

323 

350 

368 

4  

288 

358 

365 

345 

5  

260 

343 

340 

330 
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TABLE   12, 11- Abbreviated  ANOVA  for  Data  of  Table  12.10 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean  Square 

Blocks               

4 

2,146.2 

536.55 

Treatments               

3 

13,444.8 

4,481.60** 

Experimental  error    

12 

2,626.2 

218.85 

**  Significant  at  <*=0.01. 

12.8      SELECTED  TREATMENT  COMPARISONS 

It  is  often  desirable  to  make  certain  specific  comparisons  involving  a 
selected  number  of  the  treatments.  For  a  completely  randomized  de 
sign,  such  comparisons  were  discussed  in  Sections  11.8  and  11.9.  In  this 
section  the  same  general  topic  will  be  examined  in  conjunction  with  a 
randomized  complete  block  design. 

A  moment's  thought  should  be  sufficient  to  convince  the  reader  that 
the  calculations  will  be  performed  in  the  same  manner  as  indicated 
earlier.  For  example,  if  a  randomized  complete  block  design  with  one 
observation  per  experimental  unit  is  involved,  the  sum  of  squares  for 
a  particular  contrast,  C*,  would  be  given  by 


=  ( i^  «**r/Y  /  b  i: 

\  y— i  /    /  y— i 


2 

Cjk, 


(12.30) 


where  the  Cjk  are  the  coefficients  specifying  the  contrast.  As  before,  if 
t—I  orthogonal  contrasts  are  studied,  the  sum  of  the  t—I  individual 
sums  of  squares  will  equal  the  treatment  sum  of  squares. 

Example  12.10 

Upon  reading  the  complete  description  of  the  project  referred  to  in 
Example  12.9,  certain  additional  information  about  the  four  machines 
is  brought  to  light.  For  example,  machine  A  is  the  standard  type  of 
machine  now  in  use  in  the  industry,  while  machines  B,  C,  and  D  are 
new  designs  which  may  be  considered  as  possible  substitutes.  Further, 
it  is  known  that  B  and  C  contain  moving  parts  made  of  some  aluminum 
alloy,  while  D  does  not  have  this  feature.  Also  known  from  the  manu 
facturers*  specifications  is  the  fact  that  B  is  self-lubricating,  while  C  is 
not.  Therefore,  the  comparisons  represented  symbolically  in  Table 
12.12  seem  to  be  indicated.  Partitioning  of  the  treatment  sum  of 
squares  is  then  carried  out  using  Equation  (12,30),  and  the  abbreviated 
ANOVA  of  Table  12.13  is  obtained. 

12.9  SUBDIVISION  OF  THE  EXPERIMENTAL  ERROR 
SUM  OF  SQUARES  WHEN  CONSIDERING  SELECTED 
TREATMENT  COMPARISONS 

Before  proceeding  to  the  next  general  topic  connected  with  random 
ized  complete  block  designs,  another  digression  seems  desirable.  This 
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TABLE   12.12-Symbolic  Representation  of  the  Selected  Treatment 
Comparisons  Described  in  Example  12.10  (Data  in  Table  12.10) 


Comparison 

Machine 

A 

B 

C 

D 

1 

+  3 
0 
0 

—  1 

+  1 
^ 

—  1 
+  1 
+  1 

—  1 
—  2 
0 

2  

3    

TABLE    12.13-Abbreviated  ANOVA  for  Data  of  Table  12.10  Showing  the 
Subdivision  of  the  Treatment  Sum  of  Squares 
(Discussion  in  Example  12.10) 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean  Square 

Blocks                

4 

2,146.2 

536.55 

Treatments 
A  vs.  rest  

1 

13,142.4 

13,142.4** 

B  and  C  vs.  D  

1 

172.8 

172.8 

B  vs.  C    

1 

129.6 

129.6 

Experimental  error  

12 

2,626.2 

218  85 

**  Significant  at  a  =  0.01. 

time  we  will  be  concerned  with  the  possibility  of  partitioning  the  experi 
mental  error  sum  of  squares.  (NOTE:  You  will  recall  that  this  topic 
was  mentioned  in  the  last  paragraph  of  Section  11.13.) 

When  is  such  a  procedure  in  order?  That  is,  when  should  the  experi 
mental  error  sum  of  squares  be  subdivided?  The  reason  for  sub 
dividing  Eyy  (if  such  a  procedure  is  adopted)  is  that  we  are  not  satisfied 
with  our  assumption  of  homogeneous  variances  of  the  e's.  If  such  an 
assumption  is  questioned — its  validity  may,  of  course,  be  investigated 
using  Bartlett's  test — and  if  selected  treatment  comparisons  are  being 
examined,  it  is  desirable  to  subdivide  Evy  in  a  manner  similar  to  the 
subdivision  of  Tyy.  Such  a  procedure  insures  that  any  particular  treat 
ment  comparison  will  be  tested  against  the  appropriate  error.  That  is, 
the  expected  value  of  the  "error  mean  square  for  testing  Ck"  will  con 
tain  the  same  components  of  variance  (other  than  the  treatment  effects) 
as  the  expected  value  of  the  mean  square  associated  with  C^  In  other 
words,  if  we  are  faced  with  different  variances  o-^  (f  =  l,  •  •  •  ,  Z>; 
y=l,  *  -  •  ,  £),  the  procedure  of  subdividing  Eyy  will  insure  that  the 
expected  mean  squares  for  a  particular  comparison  and  its  associated 
error  will  each  contain  the  same  linear  combination  of  the  c%j.  This,  of 
course,  provides  us  with  unbiased  tests  for  the  comparisons  under  in 
vestigation. 

Since  the  decision  to  implement  the  procedure  (yet  to  be  described) 
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for  subdividing  Ew  is  one  which  everyone  will  have  to  make  at  some 
time  or  other,  some  guiding  rule  is  needed.  The  following  appears  to 
be  a  reasonable  rule  :  If  there  is  any  serious  doubt  as  to  the  homogeneity 
of  the  variances,  subdivide  the  experimental  error  sum  of  squares  as  a 
precautionary  measure.  It  should  be  noted,  though,  that  if  the  degrees 
of  freedom  associated  with  the  various  parts  of  Eyy  are  small,  the  re 
sulting  tests  may  be  relatively  insensitive  (i.e.,  of  low  discriminatory 
power).  In  practice  both  the  following  conditions  are  usually  true: 

(1)  The  degree  of  heterogeneity  among  the  error  variances  is,  as 
a  rule,  not  too  great.  Therefore,  for  most  practical  purposes, 
the  variances  may  be  considered  homogeneous. 

(2)  The  numbers  of  degrees  of  freedom  associated  with  the  parts 
into   which  the   experimental  error  sum   of   squares   is   sub 
divided  are  generally  quite  small. 

Consequently,  the  rule  stated  above  should  be  modified  to  read;  Be 
cause  of  the  truth  of  statements  (1}  and  (j£)  above,  it  is  generally  not  wise 
to  subdivide  the  experimental  error  sum  of  squares.  However,  if  the  hetero 
geneity  of  variances  is  such  that  a  subdivision  is  necessary  (regardless  of 
the  fact  that  small  numbers  of  degrees  of  freedom  will  result),  the  sub 
division  should  be  carried  out  in  accordance  with  the  procedure  to  be  ex 
plained  in  the  next  paragraph. 

The  method  of  subdividing  the  experimental  error  sum  of  squares  in 
agreement  with  a  particular  subdivision  of  the  treatment  sum  of  square 
is  as  follows  : 

(1)  Set  up  a  table  showing  the  values  of  the  contrasts  within  each 
block. 

(2)  Calculate  the  portion  of  the  experimental  error  sum  of  squares 
for  a  particular  contrast  using 

(Ek)yy  ===  experimental  error  sum  of  squares  for  Ck 

=  f  ib  el*  -  ci/b\  /  i:  4 

L    i=l  J  /          /=»! 

where 


Cki=     C  c&Yv  (12.32) 

-=i 

and 


Ck  ==          cjkTj  -     T  Cki.  (12.33) 

j—  i  i=»i 

Example  12.11 

Consider  the  experiment  discussed  in  Examples  12.9  and  12.10.  Using 
Equations  (12.32)  and  (12,33)  in  conjunction  with  Tables  12.10  and 
12.12,  we  obtain  Table  12,14.  Then,  using  Equation  (12.31),  we  get,  for 
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example, 


experimental  error  sum  of  squares  associated  with  Ca 

experimental  error  sum  of  squares  associated  with  the  comparison 

"B  versus  C" 


4. 


(_3)2  __  (36)V5]/2 


426.4. 


Similarly,  (EJvy  =  1087.27  and  (#2)^  =  1112.53.  Thus,  we  finally  obtain 
the  abbreviated  ANOVA  shown  in  Table  12.15. 

TABLE   12.14-Sums  for  the  Selected  Treatment  Comparisons  in  Each 

Block  (Data  of  Table  12.10) 


Comparison 

Block 

Ci 

C2 

C5 

1  

—    85 

—  35 

+  15 

2  

—  165 

—  30 

—  10 

3         

—  201 

—  63 

+  27 

4 

—  204 

+33 

+    7 

5  

—  233 

+23 

—    3 

Total 

—  888 

—  72 

+36 

TABLE   12. 15- Abbreviated  ANOVA  for  Data  of  Table  12.10  Showing 
The  Subdivision  of  the  Experimental  Error  Sum  of  Squares 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 

Square 

Expected  Mean  Square 

* 

Blocks  

4 

2,146.20 

536.55 

<r'+(4/4)  ±  /*' 

Treatments  : 
A  vs   rest 

1 

13,142.40 

13,142.40 

<r2  +  (25/60)  (3-n  —  r2  —  T3  — 

r^3 

B  and  C  vs.  D  .  .  . 
B  vs   C 

1 

1 

172.80 
129.60 

172.80 
129.60 

<r2+(25/30)(r2+r3-2r4)2 
0.2^  (25/10)  (r3  —  r2)2 

Experimental  error: 
A  vs   rest 

4 

1,087.27 

271  .  82^) 

B  and  C  vs.  D  .  .  . 
^  vs.  C  

4 
4 

1,112.53 
426.40 

278.  13  > 
106.  60J 

cr* 

*  The  symbol  o-2  was  used  in  each  expected  mean  square  as  a  matter  of  convenience.  If 
the  variances  are  homogeneous,  it  is  correct;  if  the  variances  are  not  homogeneous,  the 
symbol  <r2  would  be  replaced  by  various  linear  combinations  of  the  o\y. 

The  procedure  explained  and  illustrated  in  this  section  can  sometimes 
be  used  to  advantage  when  analyzing  a  particular  set  of  data.  How 
ever,  the  reader  should  realize  that  it  is  a  special  technique  and  will, 
therefore,  be  used  only  rarely. 
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12.10  ALL     POSSIBLE     COMPARISONS    AMONG     TREAT 
MENT   MEANS 

Since  the  problem  of  making  all  possible  comparisons  among  treat 
ment  means  in  a  randomized  complete  block  design  is  handled  in 
exactly  the  same  manner  as  similar  comparisons  were  handled  in  a 
completely  randomized  design,  no  additional  discussion  is  necessary. 
The  reader  is,  therefore,  referred  to  Section  11.10  for  the  appropriate 
details. 

12.11  RESPONSE  CURVES   IN   A   RANDOMIZED 
COMPLETE   BLOCK  DESIGN 

Once  again  it  is  sufficient  to  state  that  the  techniques  explained  in 
the  preceding  chapter  are  directly  applicable  to  the  present  situation. 
Thus,  the  reader  is  referred  to  Section  11.11  for  the  computational 
details.  However,  to  emphasize  the  "sameness/*  an  illustrative  example 
will  be  presented.  (NOTE:  The  reader  will  find  it  rewarding  to  com 
pare  the  following  example  with  Example  11.15.) 

Example  12.12 

Considering  the  data  of  Table  12.16  and  using  the  methods  described 
in  Section  11.11,  the  abbreviated  ANOVA  shown  in  Table  12.17  is 
obtained. 

TABLE   12. 16- Yields  (Converted  to  Bushels/Acre)  of  a  Certain  Grain 

Crop  in  a  Fertilizer  Trial 


Block 

Level  of  Fertilizer 

No 
Treat 
ment 

10  Ibs, 
per  Plot 

20  Ibs. 
per  Plot 

30  Ibs. 
per  Plot 

40  Ibs. 
per  Plot 

1 

20 

25 
23 
27 
19 

25 

29 
31 
30 
27 

36 

37 
29 
40 
33 

35 
39 
31 
42 
44 

43 
40 
36 
48 
47 

2  

3  

4  

5  

Treatment 
totals 

114 

142 

175 

191 

214 

12.12 


FACTORIAL  TREATMENT  COMBINATIONS   IN   A 
RANDOMIZED  COMPLETE   BLOCK   DESIGN 


Because  of  tlie  detail  with  which  the  analysis  of  factorial  treatment 
combinations  was  discussed  in  connection  with  a  completely  random 
ized  design  (see  Section  11,12),  only  a  summary  discussion  seems 
appropriate  here.  Accordingly,  all  that  will  be  given  are  two  linear 
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TABLE   12. 17- Abbreviated  ANOVA  for  Data  of  Table  12.16  Showing 
the  Isolation  of  the  Linear  and  Quadratic  Portions  of  the  Treatment 

Sum  of  Squares 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of  Squares 

Mean 

Square 

Blocks  

4 

154   16 

38  54 

Treatmen  ts 

4 

1256  56 

314   14 

Linear  

1 

1240  02 

1240  02** 

Quadratic  

1 

10  41 

10  41 

[Deviations    from    re 
gression  .  . 

2 

6   13 

3   07 

TC-jrp^Hrnenfal  errnr 

16 

193.44 

12   09 

**  Significant  at  «  =  0.01. 

statistical   models,   their  associated   ANOVA's,    and   some   numerical 
examples. 


Two -Factor  Case 


ijk 


Pi 


(12.34) 


Three- Factor  Case 

Yjjki  =  JJL  +  pi  +  0.3  +  fa  + 


j  =  1,  • 
k  =  1, . 


—  1> 

=  1, 
=  1, 
=  1, 


(12.35) 


,  a 
,  b 
,  c. 


In  the  preceding  equations,  M  is  the  true  mean  effect,  p^  is  the  true 
effect  of  the  ith  replicate  (or  block),  the  various  terms  involving  a,  (3 
and  y  are  the  true  effects  of  the  several  factors  and  their  interactions, 
and  the  e's  are  the  true  effects  of  the  experimental  units.  The  general 
ized  ANOVA's  associated  with  Equations  (12.34)  and  (12.35)  are 
shown  in  Tables  12.19  and  12.18,  respectively. 

Example  12.13 

Consider  the  data  in  Table  12.20.  Performing  the  usual  calculations, 
we  obtain  the  abbreviated  ANOVA  shown  in  Table  12,21.  Testing 
#1:0^  =  0  (.7  =  1,  2),  we  calculate  ^  =  32.00/3.16  =  10.2,  and  this  leads  to 
the  rejection  of  Hi.  Testing  £T2:/3;b  =  0  (&  =  1,  2,  3,  4),  we  calculate 
p  =  5.47/3.16=  1.73,  and  this  does  not  permit  H%  to  be  rejected.  To  test 
#3:(o!0)/fc  =  0  (.7  =  1,  2;  fc  =  l,  2,  3,  4),  we  calculate  F  =  12.80/3.16  =  4.05 
and  this  leads  to  rejection  of  H$  at  the  5  per  cent  significance  level  but 


TABLE  12.18-Generalized  ANOVA  for  a  Three-Factor  Factorial  in  a 


Expected  Mean  Square 

Source  of 

Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 

Square 

Model  I 

M 

Mean  

Myy 
T> 

•p 

Replicates,  .  .  . 

Treatments 

f—  1 

•K-w 

1 

A 

A 

n-i+rbt:  T!<£/fo-l) 

6-1 


AB. 


(a- 1)0- 1 


By 


Cyy 


B 


a*+rab 


1-1 


AB 


AC     

(a_l)(c_l) 

(AQyy 

AC 

(rz+rb  >  .  >  .  (a7)yi/(0""1-)w"'1^ 

/-i  1-1 

6        c^              j 

EC  

(d-l)fr-l) 

(BC)yy 

BC 

<r*+ra^T,  X)  (fryOfci/C^  l)(c—  1) 

fc-i  1-1 

a        6        c                    » 

ABC  

(0-1)  (6-  !)(<?-!) 

(ABQyy 

ABC 

;—l  jfe-1    i-1 

Experimental 

error  

(r-lXafa-l) 

Ryy 

E 

** 

—  _T 

EV2 

Total 

TdOC 

* 

**+rc  i:  : 


,--1 


[3821 


Randomized  Complete  Block  Design 


Expected  Mean  Square 


Model  II 


Model  in 
(a  and  b  fixed,  c  random) 


Model  III 
(a  fixed,  5  and  c  random) 


j-i 


[3831 


ft 

-s 


a 

o 
O 


O 


5 

o 


5-, 

5 

O 


oS 
O 


I— i      -C> 

'S       I 


d 


. 

1  * 

S  ,8 


d 

d 


"2 


I 


1 


4- 

Nb 


b 
"b 


4- 


b 

o 

b 


" 

4- 


5 

*b 


S 

4- 

wb 


^ 

-W2 


a    Z 

c»      oi 


(D 

S 
o 

ck 


J? 


w 


"-M      rj 
O      O 


I      * 

g  a 


e* 


53 


B 


5 
S 
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TABLE   12. 20- Yields  of  Soybeans  at  the  Agronomy  Farm,  Ames,  Iowa,  1949 

{In  bushels  per  acre) 


Date  of 
Planting 

Fertilizer 

Replicate 

1 

2 

3 

4 

Early 

Check 
Aero 

Na 
K 

Check 
Aero 

Na 
K 

28.6 
29.1 
28.4 
29.2 

30.3 
32.7 
30.3 
32.7 

36.8 
29.2 
27.4 
28.2 

32.3 
30.8 
32.7 
31.7 

32.7 
30.6 
26.0 
27.7 

31.6 
31.0 
33.0 
31.8 

32.6 
29.1 
29.3 
32.0 

30.9 
33.8 
33.9 
29.4 

Late  

TABLE   12.21-Abbreviated  ANOVA  for  Data  of  Table  12,20 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean  Square 

Replicates.      .         

3 

7.31 

2.44 

4 

o-*+(8/3)  52  P* 

Dates  of  planting  

1 

32.00 

32.00 

i-i 

2 

<r2+(16/l)  2D«j- 

Fertilizers  

3 

16.40 

5.47 

3—1 

o-2+(8/3)  X)  A 

Fertilizers  X  dates  of  planting. 
Experimental  error  

3 
21 

38.40 
66.43 

12.80 
3.16 

Jb-l 

«rH-cv3)i:]fc<«0)i* 

j-i  jb—i 

Or* 

not  at  the  1  per  cent  significance  level.  [NOTE :  Depending  on  whether  we 
use  <x  =  0.05  or  ex  =  0.01,  the  recommendations  will  differ.  If  a.  =  0.05,  dif 
ferent  fertilizers  would  probably  be  suggested  for  each  date  of  planting; 
if  a:  =  0.01,  it  is  possible  that  the  same  recommendation  (concerning 
fertilizers)  would  be  made  for  each  date  of  planting.] 

Example  12.14 

An  experiment  such  as  described  in  Example  10.18  was  performed. 
The  resulting  data  are  given  in  Table  12.22.  The  associated  ANOVA  is 
presented  in  Table  12.23.  It  will  be  noted  that  none  of  the  factors  led  to 
significant  results. 

So  far  in  this  section  we  have  summarized  the  cases  involving  two 
and  three  factors  with  one  observation  per  experimental  unit.  How 
ever,  there  are  two  other  topics  associated  with  factorials  in  a  random 
ized  complete  block  design  which  also  deserve  our  attention  at  this 
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TABLE   12.22~Surge  Voltages  Resulting  From  the  Experiment  Described 
in  Example  10.18  and  Discussed  in  Example  12.14 


Heat 

Tempera 

Replicate 

Electrolyte 

Paper 

ture 

(a) 

(*) 

to 

I 

II 

0 

0 

0 

6.08 

6.79 

0 

0 

1 

6.31 

6.77 

0 

1 

0 

6.53 

6.73 

1 

0 

0 

6.04 

6.68 

0 

1 

1 

6.12 

6.49 

1 

0 

1 

6.09 

6.38 

1 

1 

0 

6.43 

6.08 

1 

1 

1 

6.36 

6.23 

TABLE    12.23-ANOVA  for  Data  of  Table  12.22 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

F-Ratio 

IVIean  .  .         

1 

651  .6533 

651.6533 

Repli  cates 

1 

0.2997 

0.2997 

Treatments 
A  

1 

0  .  1462 

0.1462 

2.21 

B    

1 

0.0018 

0.0018 

0.03 

C 

1 

0.0232 

0  0232 

0.35 

AB   

1 

0.0001 

0.0001 

0.00 

AC 

1 

0  .  0047 

0.0047 

0.07 

BC  

1 

0.0176 

0.0176 

0.27 

ABC  

1 

0.0883 

0.0883 

1.33 

Experimental  error  

7 

0.4632 

0.0662 

Total 

16 

652.6981 

time.  These  are:  (1)  subsampling  and  (2)  the  analysis  of  response 
curves.  Each  of  these  will  now  be  discussed. 

Subsampling  in  a  randomized  complete  block  design  which  incor 
porates  factorial  treatment  combinations  leads  to  analyses  such  as 
shown  in  Tables  12.24  and  12.25.  Since  no  new  techniques  are  involved, 
numerical  examples  will  not  be  given.  The  reader  is  referred  to  Sections 
11.12  and  12.4  for  further  details. 

As  was  mentioned  in  Section  11.12,  when  factorial  treatment  combi 
nations  are  involved,  it  is  possible  to  subdivide  the  treatment  sum  of 
squares  into  several  parts  such  as  (A.i^yy)  (Ao)vy,  •  •  •  ;  CB.L)IW* 
(Bo)w*  *  •  •  ;  (A-jJBi^yy,  {AQBL}yVy  (A  iJB  Q}  yV  y  and  so  on.  The  pro 
cedure  to  be  followed  will  parallel  that  presented  in  Section  11.11,  the 
only  difference  being  the  refinements  introduced  to  subdivide  the 
interaction  sum  of  squares.  Because  of  this,  the  technique  will  be  pre 
sented  in  terms  of  two  numerical  examples. 
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TABLE   12. 2 4- Abbreviated  ANOVA  for  a  Two-Factor  Factorial  in  a 

Randomized  Complete  Block  Design  With  n  Samples  per 

Experimental  Unit 


Source  of  Variation 


Degrees  of  Freedom 


Expected  Mean  Square 


Replicates . . 

Treatments: 

A 


r—1 

a—\ 
6—1 


AB. 


ncrz-{-rn 


I-IX&-D 


Experimental  error. 
Sampling  error. 


TABLE    12.25-Abbreviated  AISTOVA  for  a  Two-Factor  Factorial  in  a 

Randomized  Complete  Block  Design  With  n  Samples  per  Experimental 

Unit  and  d  Determinations  per  Sampling  Unit 


Source  of  Variation 


Degrees  of 

Freedom 


Expected  Mean  Square 


Replicates. .  . 
Treatments: 
A 

B 


r—1 

a— I 
b-l 


AB. 


Experimental  error . 

Sampling  error 

Determinations.  .  .  . 


rab(n—l*) 
rabn(d —  1) 


Example  12.15 

Consider  the  data  of  Table  12.26.  The  first  step  in  the  analysis  is  the 
formation  of  the  aX&  table  shown  in  Table  12.27.  Remembering  that 
each  entry  in  Table  12.27  is  the  sum  of  r  =  2  observations  and  using  the 
polynomial  coefficients  given  in  Table  11.26,  we  find  that 

,  ,    ,  [(-3)(35)  +  (-1X42)  -4-  (1)(59)  +  (3)(60)]» 


(2)(3)[(-3)2  +  (-1)2  +  (I)2  +  (3)*] 

(-1X59)  +  (1X60)]* 


(I)2] 


70,53 


1.50 
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-1)(35)  +  (3)  (42)  +  (-3)  (59)  H 

(2) (3)  [(-I)"  +  (3)2  +  (-3)2  +  (I)2] 

(0)(65)  +  (1)(67)]* 

=  .56 


5.63 


(I)2] 


(-2)(65) 


(2)(4)[(1)*+  (-2)"  + 
where  the  divisors  are: 


.02 


(1)  for    (.Ai^yy,    (Ao)yy,   and    (^L  c)w  =  (r6)  X  (sum   of  the   squares   of   the 
coefficients) ; 

(2)  for  (Bz^yy  and  (Bo)yyi(ra)  X(sum  of  the  squares  of  the  coefficients). 

TABLE  12.26-Coded  Data  for  Use  in  Illustrating  the  Calculation  of  the 

Linear,  Quadratic,  .  .  .  Effects  in  a  Two-Factor  Factorial  Experiment 

Conducted  in  a  Randomized  Complete  Block  Design 


Replicate 

<3l 

<Z2 

a3 

a4 

1  

f6i 
l&a 

7 
5 

8 
6 

9 

11 

7 
10 

2  

i 

u, 

[6l 

•^2 

4 

7 
6 

6 

9 
6 

10 

9 
10 

12 

8 
11 

1 

U3 

6 

7 

10 

12 

TABLE   12.27-aX&  Table  Formed  From  the  Data  of  Table  12.26 


ai 

a2 

a3 

a4 

Totals 

b-L  

14 

17 

18 

15 

64 

62  

11 

12 

21 

21 

65 

&3 

10 

13 

20 

24 

67 

Totals 

35 

42 

59 

60 

196 

To  illustrate  the  computation  of  the  various  sums  of  squares  which 
comprise  (AB^yy,  let  us  take  the  case  of  (AQBiJ)yy  as  an  example.  It 
should  be  understood  that  any  other  of  the  sums  of  squares  may  be 
found  in  a  similar  manner  if  the  appropriate  word-substitution,  for 
quadratic  and  linear,  is  made  in  the  next  few  sentences.  Obtain  the  sum 
of  the  products  of  the  a  quadratic  polynomial  coefficients  by  the  totals 
in  the  cells  of  the  aX&  table  for  each  level  of  &.  Then  apply  the  b  linear 
polynomial  coefficients  to  these  "sums"  and  obtain  the  usual  sum  of 
products.  Square  this  last  sum,  and  divide  the  squared  quantity  by  the 
product  of  the  sums  of  squares  of  the  two  sets  of  polynomial  coefficients 
(quadratic  for  a  and  linear  for  6)  used  in  the  computation.  Also  divide 
by  r,  the  number  of  replicates,  since  each  total  in  the  a  X  b  table  was  the 
sum  of  r  observations.  The  resulting  value  is  the  sum  of  squares  due  to 

.  For  our  numerical  example  we  have 

for  6i:     (1X14)  +  (-1X17)  +  (-1X18)  +  (1)(15)  =  -  6 
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for 

for 

and  hence 

(AQBL")yy     = 


4-  (-1)(12)  +  (-1X21)  -f-  (1X21)  -  -  1 
,:     (1)(10)  4-  (-1X13)  -I-  (~D(20)  4-  (1)(24)  =  1 

2[(1)2  4-  (-1)2  4-  (-1)2  4-  (D2][(-l)2  +  (O)2  4-  (I)2] 


3.06. 


The  reader  should  verify  that  the  remaining  sums  of  squares  in  Table 
12.28  may  be  found  in  a  like  manner. 

TABLE    12.28-Abbreviated  ANOVA  for  Data  of  Table  12.26 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean  Square 

Replicates  

1 

1    50 

1   50 

Treatments  : 
AL  

1 

70.53 

70.53 

AQ  

1 

1.50 

1   5O 

Ac  

1 

5.63 

5   63 

BL  

1 

56 

56 

Bo  

1 

.02 

,02 

ALBL  

1 

25   31 

25.31 

A£,BQ  

1 

2.60 

2   6O 

AQ&L 

1 

3   06 

3  06 

A.oBo 

1 

.20 

20 

AC&L  

1 

.32 

.32 

AC&Q  

1 

2.60 

2.6O 

Experimental  error 

11 

3.5O 

.32 

Example  12.16 

Consider  next  the  data  of  Table  12.29.  To  save  time,  the  calculation 
of  the  sums  of  squares  will  be  illustrated  for  only  three  effects  :  A&,  AQB^ 
and  AQBCCL.  To  check  these  results,  the  reader  will  find  it  advantageous 
to  form  the  aX&Xc  and  the  aXb  tables.  Then, 


(0)(1404)  -f-  (1)(925)]* 


(2)(4)(6)[(-l)2  +  (O)2  -f-  (I)2] 
[(1X325)  +  (-2)(432) 


330.04 


996.67 


(-2)* 


1066.06 


(-1)2  4-  (I)2  4-  (3)«J 


)2][(-D2  +  (3)*  + 
(I)2  +  (3)2  4-  (5)2] 


-3)2  4- 


where 
D  = 

4-  (~3)24-  (-1 
and  the  divisors  are; 

(1)  for  (^1  £,)  yy '  (rbc)  X(sum  of  the  squares  of  the  coefficients); 

(2)  for    (.AQBj^yjfi  (re)  X  (product    of   the   sums    of   the    squares    of   the 
coefficients)  ; 

(3)  (AQBcCi^vy*  (r)  X  (product    of    the    sums    of    the    squares    of    the 
coefficients) . 
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TABLE   12.29-Hypothetical  Data  for  Use  in  Illustrating  the  Computation 

of  Certain  Sums  of  Squares  in  a  Three-Factor  Factorial,  the  Basic 

Design  Being  a  Randomized  Complete  Block 


"O  f>-rVli 

<Z] 

L 

a 

2 

a 

3 

cate 

&1 

£2 

bz 

&4 

61 

b* 

63 

?>4 

61 

b* 

bs 

&4 

1  

c\ 

C% 
£3 

7 
23 
9 

7 
18 
18 

9 

25 
24 

7 
15 
23 

15 
13 
12 

36 

35 
43 

60 
61 
62 

15 
18 
14 

24 
30 
31 

29 
26 
24 

17 
11 
15 

19 
8 
23 

2  

c± 
c*> 

LC6 
'^1 

c* 
cz 

7 
6 
10 

7 
20 
9 

13 
8 
12 

6 
19 
22 

25 
20 
30 

11 

25 
26 

36 
7 
11 

7 
16 

24 

11 
15 
10 

15 
13 
13 

12 
46 
42 

35 
30 
40 

63 
18 
27 

60 
64 
66 

26 
28 
12 

20 
20 
15 

32 
15 
17 

25 
30 

32 

15 
32 
29 

30 

25 
25 

12 
13 
8 

20 
15 
15 

5 
6 
7 

20 
10 
22 

c± 
cs 

^ 

8 
8 
9 

15 
10 
12 

26 
20 
28 

30 
8 
11 

13 
17 
8 

10 
40 

45 

66 
20 
30 

25 
30 
15 

34 
18 
19 

15 
35 
30 

15 
15 
10 

4 
5 
8 

The  numerators  for  (<Az,)i/2/  and  (AQBi^yy  were  found  by  the  same 
procedures  as  similar  quantities  in  the  preceding  example.  The  numera 
tor  for  {AoBcCi^yy  was  obtained  by  a  simple  extension  of  the  same 
principles.  In  this  case  the  extension  may  be  explained  as  follows: 
Operate  with  the  CL  coefficients  on  the  c  entries  in  the  aX&Xc  table  for 
each  level  of  b  within  each  level  of  a  and  obtain  a  set  of  sums  of  products, 
one  for  each  level  of  6  within  each  level  of  a.  Next,  use  the  Be  coeffi 
cients  and  operate  on  the  sums  just  obtained.  This  will  provide  us  with 
some  "BcCz  totals"  for  each  level  of  a.  Then  use  the  AQ  coefficients  to 
give  us  the  numerator  for  (AQBcCiJ)yy.  The  procedure  should  now  be 
clear,  and  the  extension  to  any  number  of  factors  will  be  a  simple,  even 
if  a  time-consuming,  job. 

12.13      MISSING    DATA    IN   A    RANDOMIZED   COMPLETE 
BLOCK   DESIGN 

Many  times,  even  after  considerable  effort  has  been  expended  and 
due  diligence  exercised  in  planning  an  experiment,  there  are  things 
which  occur  to  the  disadvantage  of  the  research  worker.  One  of  the 
most  common  of  these  "disturbances"  is  the  problem  of  missing  obser 
vations.  Missing  observations  arise  for  many  reasons:  An  animal  may 
die,  an  experimental  plot  may  be  flooded  out,  a  worker  may  be  ill  and 
not  turn  up  on  the  job,  a  jar  of  jelly  may  be  dropped  on  the  floor,  or 
the  recorded  data  may  be  lost.  What  effect  does  this  have  on  our 
methods  of  analysis?  Since  most  experiments  are  designed  with  at  least 
some  degree  of  balance,  or  symmetry,  any  missing  observations  will 
usually  destroy  this  balance.  Thus,  we  now  expect  our  original  planned 
analysis  to  be  complicated  and  some  modifications  in  procedure  to  be 
required.  We  could,  of  course,  in  many  instances  treat  the  data  as  a 
case  of  disproportionate  subclass  numbers  and  use  methods  of  analysis 
appropriate  to  such  situations  (see  Chapter  13).  Ho wever,  other 
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approaches  are  sometimes  open  to  the  statistician,  and  we  shall  exam 
ine  these  in  this  section,  pointing  out  the  difficulties  which  arise  and 
indicating  the  computational  procedures  to  be  followed  in  each  case. 

First,  let  us  mention  two  cases  of  missing  data  in  a  randomized  com 
plete  block  design  which  present  no  difficulties  as  to  computational 
procedures:  (1)  a  complete  block  is  missing,  or  (2)  a  treatment  is 
completely  missing.  When  one  or  more  complete  blocks  are  missing 
we  simply  proceed  with  the  standard  type  of  analysis,  provided  we 
still  have  at  least  two  blocks  remaining;  that  is,  we  analyze  the  data 
as  though  we  had  planned  only  on  the  number  of  blocks  which  are 
actually  available.  For  the  case  in  which  no  data  are  available  on  one 
or  more  treatments  (assuming  we  still  have  at  least  two  treatments 
remaining),  we  may  again  proceed  in  the  regular  manner.  However, 
in  this  instance,  the  research  worker  should  certainly  inquire  into  the 
reasons  for  the  lack  of  data  on  certain  treatments.  It  is  apparent  that 
many  things  might  have  caused  such  a  happening,  each  of  which  could 
possibly  lead  to  different  decisions  or  recommendations  on  the  part  of 
the  experimenter.  Without  a  specific  example,  further  discussion  on 
such  points  can  only  be  of  a  vague  nature;  rather  than  continue  in 
general  terms,  we  shall  postpone  further  remarks  on  this  type  of 
situation  until  the  need  arises. 

A  more  commonly  occurring  situation  is  the  one  in  which  one  obser 
vation  is  missing.  Here  we  run  into  difficulty  in  our  analysis.  Either 
we  must  treat  the  data  by  methods  appropriate  to  disproportionate 
frequencies,  or  we  must  find  some  other  scheme  which  we  hope  will  be 
simpler  to  apply.  One  such  device  is  to  estimate  a  value  to  replace  the 
missing  observation  and  then  to  proceed  with  the  usual  analysis  for 
randomized  complete  block  designs.  How  does  one  obtain  an  estimate 
of  the  missing  observation?  The  estimation  procedure  currently  favored 
by  statisticians  is  to  assign  that  value  for  the  missing  observation 
which  will  minimize  the  experimental  error  sum  of  squares  when  the 
regular  analysis  is  performed.  To  mathematics  students  this  is  another 
familiar  problem  in  differential  calculus;  calling  the  missing  observa 
tion  M,  the  experimental  error  sum  of  squares  is  computed,  or  rather 
the  algebraic  expression  for  the  experimental  error  sum  of  squares  is 
formulated,  and  by  differentiating  this  expression  with  respect  to  M 
and  equating  to  0,  a  solution  may  be  obtained.  For  the  student  not 
proficient  in  mathematics,  this  procedure  may  be  summarized  by  the 
following  formula  which  will  provide  an  estimate  of  the  missing  obser 
vation  in  accordance  with  the  above  principle: 


tT  —  bB  —  S 

'  C12.36) 


where 


t==  number  of  treatments 

6  —  number  of  blocks 

T  =  sum  of  observations  with  the   same  treatment   as  the  missing 

observation 

B  =  sum  of  observations  in  the  same  block  as  the  missing  observation 
of  all  the  actual  observations. 
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This  value,  that  is,  M,  is  then  entered  in  the  appropriate  place  in  the 
table  of  data  and  the  augmented  data  are  analyzed  in  the  customary 
manner. 

We  are  now  ready  to  construct  our  analysis  of  variance  table  and  to 
test  the  hypothesis  £T:ry  =  0(j  =  1,  •  -  •  ,  Z).  However,  certain  changes 
must  be  made  in  the  form  of  our  analysis  of  variance  table  if  we  are  to 
avoid  biased  results.  The  first  change  is  easy  to  apply  and  proceeds  as 
follows:  Reduce  the  degrees  of  freedom  associated  with  both  experi 
mental  error  and  total  by  1.  That  is,  the  degrees  of  freedom  for  experi 
mental  error  become  (6  —  !)(£ — 1)  —  1  and  the  degrees  of  freedom  for 
total  become  bt — 1.  The  second  change  is  a  little  more  cumbersome  to 
apply.  Before  detailing  this  change,  let  us  discuss  what  it  is  and  why  it 
is  necessary.  It  may  be  proved  that,  under  the  null  hypothesis,  the 
expected  value  of  Tw/(t—l'),  the  treatment  mean  square  calculated 
from  the  augmented  data,  is  greater  than  <r2,  the  expected  value  of  the 
experimental  error  mean  square.  Thus  any  test  of  hypothesis  which 
does  not  correct  for  this  fact  will  be  a  biased  test  and  can  only  be 
considered  approximate.  The  correction  for  this  bias,  the  second  change 
mentioned  above,  is  to  decrease  the  treatment  sum  of  squares,  Tyy, 
by  the  amount 

[B  —  (t  —  l)^f]2 
Correction  for  bias  =  Z  =  ,  (12  .37) 

*(*  -  1) 
which  gives  us  a  new  treatment  sum  of  squares 

T'yy  =   Tyy  —  Z,  (12.38) 

and  the  analysis  of  variance  indicated  in  Table  12.30  is  finally  obtained. 

TABLE   12. SO-Generalized  ANOVA  for  a  Randomized  Complete  Block 
Design  With  One  Missing  Observation 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

F-Ratio 

]VIean      

1 

Mw 

M 

Blocks        

6—1 

JStnt 

B 

Treatments    

t—  1 

Tf 

Tf 

T'/E 

Experimental  error 

(£_!)(£  __!)__  1 

EWJ 

E 

Total 

bt—l 

73  F2  —  Z 

Example  12.17 

An  experiment  was  conducted  by  Tinker  (13)  to  investigate  the 
consistency  of  blink-rates  during  reading.  Data  were  recorded  for  six 
successive  5-minute  periods  of  reading.  As  we  have  extracted  only  part 
of  the  available  data  for  our  example,  care  should  be  exercised  in 
drawing  conclusions  from  the  analysis  -which  follows.  The  original  paper 
should  be  consulted  by  those  desiring  further  information  on  the  sub 
ject  matter.  We  will  assume  that  the  experiment  was  performed  on 
four  individuals — they  will  be  our  blocks — and  the  six  periods  will 
represent  the  treatments.  Moreover,  to  illustrate  the  techniques  of  this 
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section,    we   will   assume   that   the   observation   on   Subject   A   for   the 
fourth  period  is  missing.  Our  observed  data  are  given  in  Table  12.31. 

TABLE    12.31— Number  of  Blinks  for  Successive  Five-Minute  Periods  of  Reading 


Periods 

Sub 

jects 

1 

2 

3 

4 

5 

6 

A  

24 

23 

28 

30 

41 

B  

18 

17 

17 

19 

19 

18 

C  

41 

41 

49 

39 

19 

27 

D  

46 

69 

74 

58 

54 

50 

Adapted  from  M.  A.  Tinker,  "Reliability  of  Blinking  Frequency  Employed  as  a  Measure 
of  Readability,"  Jour.  E,xp.  Psych.,  XXXV,  421. 

Substituting  ia   our  formula,    we   find   our   estimate    of  the   missing 
value  to  be 

tT  +  bB  —  S        6(116)  -f-  4(146)  —  821 


M 


30.6. 


(*-  1)(Z>-D  5(3) 

The  correction  for  bias  in  the  treatment  sum  of  squares  is  found  to  be 

[JB  -  (f  -  l)Af]»        [146  -  5(30.6)]2 

Z  =  *=!)  " 6(5)  -  ^ 

and  so  we  arrive  at  the  analysis  of  variance  presented  in  Table  12.32. 

TABLE   12. 3 2- Abbreviated  ANOVA  of  Number  of  Blinks  During  Reading 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of  Squares 

Mean  Square 

Blocks                     .... 

3 

5233.82 

1744.61 

Treatments 

5 

339.90 

67.98 

Experimental  error  

14 

1068.40 

76.29 

The  test  of  the  null  hypothesis  ^r:ry  =  0(y=  1,  -  -  *  ,  6)  gives  rise  to  an 
F-value  less  than  unity.  Noting  that  F'*=l/F  is  not  significant,  we 
conclude  that  the  blink-rate  is  consistent  during  reading  when  meas 
ured  over  six  successive  5-minute  periods.  It  is  evident  that  there  are 
wide  differences  among  individuals,  a  fact  which  is  not  surprising  and 
which  confirms  our  judgment  in  performing  the  experiment  as  we  did, 
that  is,  by  removing  the  inter-individual  differences  which  otherwise 
would  have  appeared  as  part  of  the  experimental  error  sum  of  squares. 
(NOTE:  Actually,  the  value  we  have  assumed  to  be  missing  was  re 
corded  as  27  in  the  original  source  of  data.  It  will  pay  the  reader  to  do 
the  analysis  with  the  true  value  entered  in  the  table  for  comparison  with 
the  approximate  solution  presented  above.  This  should  give  him  an 
indication,  but  only  an  indication,  of  how  reliable  the  estimation  pro 
cedure  is.) 

If  two  or  more  values  are  missing,  the  same  general  procedure  (using 
the  calculus)  may  be  followed  to  provide  estimates.  For  th.e  person 
not  familiar  with,  the  requisite  mathematical  tecliniques,  equivalent 
results  may  be  obtained  by  use  of  the  following  iterative  method. 
Suppose  that  two  values  are  missing:  For  one  of  these  substitute  the 
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mean  of  all  recorded  observations,  and  then  estimate  the  second  mis 
sing  value  using  Equation  (12.36);  next,  place  this  estimate  in  its 
proper  place  in  the  table,  remove  the  general  mean  from  its  position 
in  place  of  the  first  missing  observation,  and  then  estimate  a  value 
for  the  first  missing  observation  using  Equation  (12.36).  After  about 
two  cycles  you  will  find  very  little  or  no  change  in  successive  estimates 
of  the  same  missing  value.  When  this  point  is  reached,  you  have  the 
estimated  values.  This  procedure  may  easily  be  extended  to  cases  where 
three  or  more  observations  are  missing. 

What  changes  are  necessary  before  one  proceeds  with  the  usual  F- 
tests  if  we  have  been  forced  to  estimate  several  missing  values?  First, 
we  must  reduce  the  degrees  of  freedom  associated  with  both  experi 
mental  error  and  total  by  the  number  of  observations  estimated.  Sec 
ond,  the  treatment  sum  of  squares  must  be  reduced  by  a  specified 
quantity  to  avoid  a  biased  test  procedure.  If  we  have  only  two  missing 
observations  (not  in  the  same  block),  the  necessary  correction  for  bias 
is  given  by 

-          \_B>  -  (jt  -  1)M']*  +  [B"  -(jt-  VM"\* 

Z,   =  }  ( 1 2  .  o  9 ) 

t(t  -  i) 

where 

t  —  number  of  treatments 
B'  =  total  of  all  the  observations  in   the    same   block   as   the    first 

missing  observation 
J5"  =  total  of  all  the  observations  in  the  same  block  as  the  second 

missing  observation 

M'  =  estimate  of  the  first  missing  observation. 
M"  =  estimate  of  the  second  missing  observation. 

If  more  than  two  observations  are  missing,  or  if  two  observations  are 
missing  in  the  same  block,  a  formula  giving  the  correction  for  the  bias 
in  the  treatment  sum  of  squares  may  be  found  in  Yates  (14,  15). 

Problems 

12.1  The  folio  wing  data  are  from  an  experiment  involving  a  randomized 
complete  block  design.  Complete  the  appropriate  analysis  of  vari 
ance,  and  test  the  hypothesis  that  the  true  effects  of  the  four  treat 
ments  are  equal.  State  all  your  assumptions. 
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Treat 

ment 

Block 

1 

2 

3 

4 

1 

20 

18 

16 

17 

2 

18 

18 

16 

2O 

3  .  . 

20 

18 

17 

18 

4  

20 

16 

20 

17 

5  

19 

16 

16 

2O 

12.2  In  a  randomized  complete  block  experiment  with  5  treatments  in  10 
replications,  the  variance  among  the  5  treatment  means  was  100. 
Complete  the  following  abbreviated  ANOVA,  and  test  the  hypothesis 
that  the  5  treatment  effects  are  the  same. 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 

Square 

Replicates  

90 

Treatments  

Experimental  error  .  .  . 

5 

12.3  Upon  calculating  the  analysis  of  variance  of  the  yields  of  6  varieties 
planted  in  8  randomized  complete  blocks,  the  3  sums  of  squares,  for 
varieties,    for    blocks,    and    for    experimental    error    (or    remainder), 
were    each    245.    Complete,    as    far    as    is    possible,    the    appropriate 
analysis    of   variance,    and    compute    a    value    of    F   for    testing    the 
significance  of  the  differences  among  varieties.  Interpret  your  result 
in  terms  of  the  appropriate  model,  and  give  your  conclusions. 

12.4  Examine  the  results  given  below  to  learn  about  the  effectiveness  of 
chalk  and  lime  applications  in  neutralizing  soil  acidity  and  thus  in 
creasing  the  stand  of  beets. 


Number  of  Beets  per  Plot 


Block 

Control 

Chalk 

Lime 

1  

49 

135 

147 

2 

37 

151 

131 

3 

114 

143 

103 

4         ... 

140 

146 

147 
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12.5       Analyze  the  data  in  the  following  table  and  interpret  the  results. 

RATIO  OF  DRY  TO  WET  WHEAT 


Nitrogen 

Applied 

Block 

None 

Early 

Middle 

Late 

1 

0.718 

0.732 

0.734 

0.792 

2  

0.725 

0.781 

0.725 

0.716 

3  

0.704 

1.035 

0.763 

0.758 

4  

0.726 

0.765 

0.738 

0.781 

12.6  To  study  the  relative  efficiencies  of  5  different  types  of  filter,  an  ex 
periment  is  to  be  performed  using  a  certain  brand  of  oil.  Fifteen 
quarts  of  oil  (in  1-qt.  tins)  are  purchased  and  the  same  amount  of 
foreign  material  is  added  to  each  quart.  Since  only  5  tests  can  be 
performed  in  any  one  day,  we  proceed  as  follows:  (1)  allocate,  at 
random,  the  15  quarts  into  three  groups  of  5  each;  (2)  allocating  the 
groups  to  the  days,  assign  the  treatments  at  random  to  the  quarts 
within  groups;  (3)  perform  the  experiment;  and  (4)  collect,  analyze 
and  interpret  the  data. 

AMOUNT  or  FOREIGN  MATERIAL  CAUGHT  BY  FILTER 


Type  of  Filter 

"Rlnr^L-c 

(Days) 

A 

B 

C 

D 

E 

1 

16.9 

18.2 

17.0 

15.1 

18.3 

2 

16.5 

19.2 

18.1 

16.0 

18.3 

3 

17.5 

17.1 

17.3 

17.8 

19.8 

12.7  In  a  paired  experiment  there  were  10  pairs  with  the  sum  of  the 
squares  of  the  deviations  of  the  differences  from  their  mean  being 
]£d2  =  360.  The  totals  for  the  2  treatments  were  7^  =  160  and 
2^=  120.  Complete  the  following  abbreviated  ANOVA. 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Pairs  or  replications 

100 

Treatments  ,  ,  .  . 

Experimental  error  
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12.8  Discuss  the  following  statement:  "If  only  one  sample  is  obtained  from 
each,  experimental  unit,   e.g.,   if  one  small  sample  is  taken   from  a 
field  plot  to  estimate  the  effect  on  the  whole  plot,  n  is  set  equal  to  1 
in  Table  12.6,  and  the  line  for  sampling  error  is  omitted.  However, 
if  the  whole  plot  in  our  field  plot  example  is  harvested,   then  the 
sampling  error  is  reduced  to  0,  and  we  have  an  analysis  as  in  Table 
12.2." 

12.9  In  a  randomized  complete  block  experiment  on  the  accuracy  of  de 
termination  of  ascorbic  acid  concentration  in  turnip  greens  (Heinze- 
Kanapaugh  method),  4  weights  of  sample  were  tried  in  5  replications. 
Two  determinations,  A.  and  B,  were  made  on  each  sample.  The  results 
(in  micrograms  per  milliliter  of  filtrate)  were  as  follows : 


Sample 
Weight 
(Grams) 

Replication 

1 

2 

3 

4 

5 

A 

B 

A 

B 

A 

B 

A 

B 

A 

B 

5  

34.2 
12.8 
5.8 
3.5 

37.2 
12.8 
8.2 
3.5 

47.0 
21.5 
10.2 
5.0 

52.5 
22.0 
13.0 
6.0 

48.5 
24.5 
16.5 
9.8 

46.5 
23.0 
11.0 
6.8 

44.2 

17.8 
9.5 

5.2 

44.2 
17.8 
15.2 
3.5 

42.5 
17.0 
11.0 
3.8 

43.5 
17.5 
10.5 
4.7 

2  

1  

0.5           

Complete  the  analysis  of  variance  for  these  data. 
12.10      Given  the  following  abbreviated  ANOVA: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Expected 
Mean 
Square 

Replicates  

3 

176 

Treatments  

7 

352 

"KxTvcTimeTi'fcaJ.  error              .  . 

21 

88 

Sampling  error         

96 

40 

Determinations  

256 

10 

(a)  Give  the  experimental  error  mean  square  in  the  above  analysis 
for  the  following: 

(1)  if  10  had  been  added  to  each  determination, 

(2)  if  each  determination  had  been  multiplied  by  10. 

(6)  Fill  in  the  expected  mean  squares  in  the  above  table,  assuming  we 
are  interested  in  just  these  8  treatments  but  that  replicates, 
samples,  and  determinations  may  be  considered  as  random 
variables. 
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12,11      Given  the  following  abbreviated  ANOVA: 


12.12 


12.13 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Blocks  

3 

Treatments  

8 

Experimental  error  
Samples  within  plots.  .  .  . 

24 
144 

1084 
381 

if 


(a)  What  is  the  construction  of  the  experiment? 

(b)  What  is  the  variance  of  a  treatment  mean? 

(c)  Give  the  answer  to  (6)  if  we  have  only  1  sample  per  plot. 

(d)  What  is  the  maximum  precision  obtainable  by  sampling;  i.e., 
we  take  k  samples,  k — >  co  ? 

We  conducted  a  field  experiment  to  estimate  the  effect  of  9  fertilizers 
on  the  yield  of  oats.  Instead  of  harvesting  each  plot  completely,  we 
took  12  samples,  3  by  3  feet,  from  each  plot.  The  abbreviated 
ANOVA  is  as  follows: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Replicates  

3 

384 

Xreatrnents  

8 

960 

Experimental  error,     

24 

192 

Among  samples  within  plots  

396 

24 

(a)  Assuming  that  the  components  of  variance  do  not  change,  esti 
mate  the  gain  or  loss  in  information  in  the  above  experiment,  had 
6  replicates  been  used  with  8  samples  per  plot. 

(6)  What  would  the  above  mean  squares  be  if  the  analysis  of  variance 
had  been  computed  using  the  totals  of  the  12  samples  in  each 
plot? 

Given  the  following  abbreviated  ANOVA: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Expected 
Mean 
Square 

Replicates  

3 

288 

Treatments  

7 

432 

Experimental  error  

21 

144 

Among  samples  within  experimental  units  .  . 
Among  detenriinatioris  per  sample.  ........ 

96 
256 

72 
6 

(a)   Compute  the  variance  of  a  treatment  mean. 
(6)    Give  the  expected  mean  squares. 
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(c)  Compute  the  gain  or  loss  In  efficiency,  or  information,  had  6  repli 
cates  been  used  with  8  samples  from  each  experimental  unit  and 
1  determination  per  sample. 

(d)  Give  the  experimental  error  mean  square  for  the  following: 

(1)  if  10  had  been  added  to  each  determination, 

(2)  if  each  determination  had  been  multiplied  by  10. 

(e)  Test  the  hypothesis  that  there  are  no  differences  among  the  true 
effects  of  the  eight  treatments. 

A  chemist  is  confronted  with  the  problem  of  just  where  he  should 
expend  his  efforts  in  the  following  situation:  A  series  of  8  soil  treat 
ments  are  applied  in  a  randomized  complete  block  design  with  2  repli 
cations,  3  soil  samples  from  each  plot  are  taken  in  the  field,  each 
sample  is  divided  into  2  portions  in  the  laboratory,  and  duplicate  de 
terminations  for  each  portion  are  analyzed  for  nitrate  nitrogen.  The 
following  mean  squares  are  given: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Treatments.  .  .                  .                ..... 

7 

11700 

Experimental  error    .     

7 

1300 

Samples  within  pints  ,,,,,. 

32 

100 

Portions  within  samples    .          .  .       .... 

48 

20 

Determinations  within  portions 

96 

16 

12.15 


Find  the  expected  mean  squares  and  estimate  the  variance  compo 
nents.  What  might  be  his  gain  or  loss  in  efficiency  in  future  experi 
ments  if  he  used  6  replicates,  but  still  continued  to  run  only  24 
analyses  per  treatment,  e.g.,  2  samples  per  plot,  2  portions  per  sample 
and  1  determination  per  portion? 

In  an  experiment  to  test  the  effect  of  6  treatments  on  some  soil 
characteristic,  we  obtained  the  following  abbreviated  ANOVA.  A 
total  of  6  soil  samples  was  selected  at  random  from  each  plot,  and 
2  chemical  determinations  were  made  of  each  sample. 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Replicates  

4 

240 

Treatments    .  ,  

5 

360 

Experimental  error            

20 

120 

Samples  within  plots    

150 

60 

Determinations  per  sample  

180 

4 

(a)   Compute  the  variance  of  a  treatment  mean  (per  determination). 

(6)  Estimate  the  gain  or  loss  in  efficiency  in  the  above  experiment  if 
we  had  taken  8  samples  per  plot  and  had  made  only  1  determi 
nation  per  sample. 
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12.16      Given  the  following  abbreviated  A1STOVA  for  a  randomized  complete 
block  design: 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Expected 
Mean  Square 

Blocks  

9 

.4074 

Treatments.  .  

3 

1.1986 

Experimental  error.  .  . 

27 

,6249 

(a)   Complete  the  analysis;  fill  in  expected  mean  squares. 
(6)    Estimate  the  efficiency  of  this  design  relative  to  a  completely 
randomized  design. 

(c)  Compute  the  standard  error  for  a  treatment  mean  and  for  the 
difference  between  2  treatment  means. 

(d)  The  treatment  means  are  1.464,  1.195,  1.325,  and  1.662.  What 
mean  or  means  do  you  suspect  might  represent  different  popula 
tions? 


12.17  The  following  data  give  the  gains  in  weight  of  pigs  in  a  comparative 
feeding  trial.  Analyze  and  interpret  the  data,  paying  attention  to 
the  comparison  of  Rations  I,  II,  and  III  with  Rations  IV  and  V. 

GAINS  03?  PIGS  iisr  A  COMPARATIVE  FEEDING  TRIAL 


Replicate 

Ration  I 

Ration  II 

Ration  III 

Ration  IV 

Ration  V 

1 

165 

168 

164 

185 

201 

2  

156 

180 

156 

195 

189 

3 

159 

180 

189 

186 

173 

4  

167 

166 

138 

201 

193 

5  

170 

170 

153 

165 

164 

6  

146 

161 

190 

175 

160 

7     

130 

171 

160 

187 

200 

8  

151 

169 

172 

177 

142 

9 

164 

179 

142 

166 

184 

10  . 

158 

191 

155 

165 

149 
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12.18 


12.19 


12.20 


The  following  data  are  extracted  from  a  larger  experiment  concerned 
with  oat-seed  treatment.  The  following  yields  in  grains  were  obtained 
with  2  rates  of  the  same  compound  over  7  replicates: 


R£ 

Lte 

Replicate 

1 

2 

Check 

1  

360 

391 

408 

2  

436 

382 

409 

3  

413 

414 

340 

4. 

353 

416 

324 

5  

328 

375 

304 

6  

269 

422 

268 

7  

220 

227 

290 

What  conclusion  do  you  draw?  In  a  separate  column  are  the  yields 
for  the  untreated  seed.  Is  seed  treatment  worth  the  added  expense  in 
this  instance? 

Results  similar  to  those  in  Problem  12.18  are  also  available  for  flax. 
What  advice  would  you  give  about  the  use  of  Ceresan  M  as  against 
224,  and  about  the  advisability  of  seed  treatment? 


Replicate 

Ceresan  M 

224 

Check 

1  

19.2 

14.4 

13.2 

2  

14.8 

24.6 

19.2 

3  

26.7 

22.9 

17.4 

4    

17.6 

22.7 

16.4 

5  

22.1 

22.0 

15.8 

6  

21.7 

22.0 

14.6 

7  

23.9 

20.4 

12.5 

8  

19.1 

16.0 

13.0 

A  project  studying  farm  structures  was  concerned  with  the  insulation 
of  poultry  houses.  The  data  obtained  from  a  study  of  a  set  of  model 
structures  (total  number  of  eggs  over  4  replicates  of  each  treatment) 
were  as  follows: 

Standard  house+laying  mash 250 

3"  wall  insulation+laying  mash 280 

3"  wall  insulation+laying  mash+cod  liver  oil 350 

6"  wall  insulation+laying  mash 310 

6"  wall  insulation+laying  mash+cod  liver  oil 400 

Construct  a  reasonable  set  of  4  orthogonal  comparisons  based  on  the 
above  treatments.  Calculate  the  sum  of  squares  for  one  of  your  com 
parisons  and  test  for  significance.  The  following  is  part  of  the  original 
analysis: 
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Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Treatments  

4 

3470 

Experimental  error       

12 

1728 

12.21  Assume  a  randomized  complete  block  experiment  with  4  treatments 
and  8  replicates.  One  of  the  treatments  is  a  control  or  check  and  the 
other  3  are  different  methods  of  treatment.  Assume  that  the  mean 
effect  of  all  32  experimental  units  is  40,  that  the  mean  effect  for  the 
control  is  34,  and  that  the  mean  effect  for  method  B  is  42.  Also  the 
following  abbreviated  ANOVA  is  given: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Replicates  

7 

32 

Treatments  

3 

64 

Experimental  error  

21 

16 

(a)   What  is  the  experimental  error  variance  per  experimental  unit? 
(6)    Compute  the  coefficient  of  variation  per  experimental  unit, 

(c)  Compute  the  variance  of  a  treatment  mean. 

(d)  Is  the  difference  between  the  mean  effects  of  control  and  method 
B  significant  at  the  1  per  cent  level? 

(e)  Compute  and  interpret  the  95  per  cent  confidence  interval  esti 
mate  of  the  mean  difference  between  the  control  and  method  B. 

12.22     An  experiment  was  conducted  to  assess  the  relative  merits  of  5  dif 
ferent  gasolines.  Since  vehicle  to  vehicle  variations  in  performance 
are  inevitable,  the  test  was  run  using  5  cars,  hereafter  called  blocks. 
The  following  descriptions  of  the  5  gasolines  are  available: 
A:   control 

B:   control+ additive  X  manufactured  by  company  I 
C:   control  +  additive  Y  manufactured  by  company  I 
D:  control -4-additive  IT  manufactured  by  company  II 
E:   control  +  additive  V  manufactured  by  company  II. 
The  data,  in  miles  per  gallon,  are  given  below.  Please  analyze  and 
interpret  the  data. 


Blocks  (Cars) 

Treatments 

(Gasolines) 

1 

2 

3 

4 

5 

A 

22 

20 

18 

17 

19 

B 

28 

24 

23 

19 

25 

C 

21 

23 

25 

25 

27 

D 

26 

21 

21 

22 

20 

E 

27 

25 

22 

20 

24 
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12.23 


12.24 


12.25 


12.26 


12.27 


Subdivide  the  experimental  error  sum  of  squares  in  each,  of  the  follow 
ing  problems  in  accordance  with  the  principles  given  in  Section  12.9: 

(a)  12.4  (d)   12.19 

(b)  12.17  (e)    12.20 

(c)  12.18  (f)    12.22 

Using  the  technique  presented  in  Section  11.10,  analyze  further  the 
data  given  in  the  following  problems: 

(a)  12.1  (d)   12.6  (g)    12.19 

(b)  12.4  (e)    12.17  (h)  12.20 

(c)  12.5  (f)    12.18  (i)     12.22 

In  Table  12.31  we  presented  some  data  on  blinking  rates  in  successive 
5-minute  periods  of  reading.  After  substituting  for  a  missing  observa 
tion,  these  data  were  analyzed  using  a  randomized  complete  block 
design.  Ignoring  the  fact  that  we  had  to  estimate  a  missing  observa 
tion,  discuss  critically  the  use  of  a  randomized  complete  block  design 
in  analyzing  data  of  this  type.  If  you  feel  that  the  use  of  a  random 
ized  complete  block  design  was  unjustified,  state  reasons  to  support 
your  contention  and  give  what  you  believe  to  be  an  appropriate 
method  of  handling  such  data.  Examine  all  your  assumptions 
carefully. 

Five  levels  of  fertilizer,  0,  10,  20,  30,  and  40,  were  applied  to  corn  in  a 
randomized  complete  block  design.  A  preliminary  analysis  of  vari 
ance  gave  the  following  results: 


d.f. 

Replicates 4 

Fertilizers 4 

Experimental  error 16 


M.S. 
2500 
2800 
1500 


The  sums  of  the  yields  in  the  5  plots  of  each  level  were: 
Level  0  10  20  30  40 


Total  yield 


20 


140 


260 


300 


280 


What  additional  computations  would  you  make  to  interpret  the 
effect  of  treatments?  Make  these  computations,  and  interpret  the 
results. 

The  strength  index  of  cotton  fibers  was  thought  to  be  affected  by  the 
application  of  potash  to  the  soil.  A  randomized  complete  block  experi 
ment  was  conducted  to  get  evidence.  Here  is  a  summary  of  the  plot 
strength  indexes: 


Treatment 

/"|~>                                -J               j-v-C        "C?""        /*"% 

Replications 

^Jrounas  ot  &.2.(J 
per  Acre) 

1 

2 

3 

36         

7.62 

8.00 

7.93 

54    

8.14 

8.15 

7.87 

72    

7.76 

7.73 

7.74 

108    

7.17 

7.57 

7.80 

144  

7.46 

7.68 

7.21 

404 
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Analyze  the  data.  Plot  the  mean  strength  index  for  each  treatment, 
Y,  against  the  pounds  of  fertilizer  per  acre,  X.  The  sum  of  squares 
attributable  to  regression  is  0.5662  with  1  degree  of  freedom  (verify 
this).  Subtract  this  from  your  sum  of  squares  for  treatments  (4 
degrees  of  freedom).  The  remainder  (3  degrees  of  freedom)  is  the 
sum  of  squares  of  deviations  from  regression.  Complete  the  analysis 
of  variance.  Test  the  hypothesis  of  0  regression.  What  conclusions 
do  you  draw? 

The  following  are  the  yields  (tons  per  acre)  of  sugar  beets  on  plots 
which,  2  years  earlier,  had  been  treated  with  lime: 


TV*a.a  frn  pin  f 

Replications 

(Tons  per  Acre) 

1 

2 

3 

4 

5 

1            

13.7 

13.3 

12.6 

14.7 

10.8 

2  

16.9 

17.1 

14.7 

15.7 

15.4 

3..  .  

17.3 

17.1 

16.9 

16.2 

14.6 

4  

17.8 

16.5 

17.9 

15.7 

16.3 

12.29 


Analyze  the  data.  Test  the  hypothesis  that  there  is  no  effect  of 
treatment.  Plot  the  treatment  means,  Y,  against  the  rate  of  applica 
tion  of  Erne,  X.  Do  you  think  the  regression  is  linear?  As  evidence, 
divide  the  sum  of  squares  for  treatments  into  2  parts :  attributable  to 
regression,  35.52;  and  remainder.  Test  the  null  hypothesis  that  there 
is  no  deviation  from  linear  regression.  Instead  of  thinking  about  re 
gression,  you  might  have  divided  the  treatment  sum  of  squares  into 
these  2  parts:  (1)  due  to  difference  between  mean  of  first  treatment 
and  mean  of  the  other  3  combined,  43.02;  and  (2)  differences  among 
means  of  the  last  3  treatments.  What  conclusions  do  you  reach? 
Consider  an  experiment  to  assess  the  relative  effects  of  4  different 
treatments  (i.e.,  packing  pressures)  on  the  function  time  of  a  certain 
explosive  actuator.  Casings  are  available  from  4  different  production 
lots.  Four  casings  were  randomly  selected  from  each  of  the  lots  and 
the  treatments  were  assigned  at  random  within  each  lot.  Given  the 
data  shown  below  (operation  time  in  milliseconds),  analyze  and 
interpret  the  results. 


Packing  Pressures  (psi) 

Blocks 

(Lots) 

10,000 

20,000                  30,000 

40,000 

1 

12 

17                           10 

12 

2 

11 

16                             9 

11 

3 

10 

15                             8 

11 

4 

9 

15                             8 

10 
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12.30     Analyze  and  interpret  the  following  data  on  yields  of  sweet  potatoes 
obtained  with  various  combinations  of  fertilizer    (n  =  N,   p  —  PzOs, 


Replicate  1 

Replicate  2 

npk 

Yield 

npk 

Yield 

npk 

Yield 

npk 

Yield 

npk 

Yield 

npk 

Yield 

133 

45 

211 

39 

333 

70 

212 

83 

211 

56 

133 

65 

111 

34 

313 

62 

311 

40 

221 

52 

321 

49 

112 

48 

221 

42 

222 

65 

212 

45 

322 

65 

333 

92 

311 

56 

323 

69 

233 

92 

132 

53 

313 

101 

122 

75 

332 

79 

213 

58 

123 

56 

121 

54 

111 

50 

312 

86 

213 

95 

331 

51 

332 

91 

223 

69 

331 

61 

232 

74 

222 

81 

232 

72 

131 

73 

322 

85 

132 

89 

223 

109 

231 

84 

122 

56 

112 

55 

113 

60 

123 

90 

113 

68 

323 

103 

312 

82 

321 

75 

231 

78 

233 

122 

131 

98 

121 

64 

Totals 

509 

608 

554 

713 

707 

675 

Grand 

Totals 

1671 

2095 

12.31      Given  the  following  abbreviated  ANOVA: 


Source  of  Variation 

Degrees  of 
Freedom 

Mean 
Square 

Replicates  

4 

70 

Treatments  : 
A                   

3 

50 

B                   

3 

160 

AB              

9 

40 

Experimental  error  

60 

10 

Interpret  the  effects  of  a  and  &  assuming  that: 

(1)  the  various  levels  of  both  a  and  6  are  fixed  or  selected; 

(2)  the  various  levels  of  both  a  and  6  are  random  variables; 

(3)  the  levels  of  a  are  fixed,  but  the  levels  of  &  are  random; 

(4)  the  levels  of  a  are  random,  but  the  levels  of  &  are  selected. 

12.32  Mr.  X  sprayed  apple  leaves  with  different  concentrations  of  a  nitro 
gen  compound,  then  determined  the  amounts  of  nitrogen  (ing.  per 
sq.  dcm.)  remaining  on  the  leaves  immediately  and  at  two  subsequent 
times.  The  object  was  to  learn  the  rate  at  which  the  nitrogen  was  ab 
sorbed  by  the  leaves.  There  were  two  replications  of  each  treatment. 
The  first  entry  in  each  cell  of  the  table  is  for  the  first  replication. 
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Time 

Levels  of  Nitrogen 

n\ 

n* 

ns 

t0  

2.29 

2.24 

0.46 
0.19 

0 
0.26 

6.50 
5.94 

3.03 
1.00 

0.75 
1.16 

8.75 
9.52 

2.49 
2.04 

1.40 
1.81 

ti  

tz  

12.33 


Obtain  the  analysis  of  variance  which,  subdivides  the  8  degrees  of 
freedom  for  treatments  into  individual  comparisons:  N&,  NQ,  TL, 
TQ,  NLTL,  NLTQ,  NQTL,  and  NQTQ. 


23  FACTORIAL  FIELD  PLAN  WITH  YIELDS 


Replicate  1 

Replicate  2 

Replicate  3 

Replicate  4 

(1)      7          b  24 
abc  39        ac  31 
a  30          c  21 
be  27       ab  39 

ab  36        be  31 
(1)    19        ac  36 
abc  41          b  30 
c  30          a  33 

a  28        ac  31 
c  24          b  19 
ab  35       (1)   13 
be  26      abc  36 

abc  66       (1)    11 
a  31         be  29 
c  21        ac  33 
&  25        <zZ>  43 

12.34 


Complete  the  analysis  of  variance,  computing  the  treatment  sum  of 
squares  for  each  of  the  individual  treatment  effects,  and  subdividing 
the  experimental  error  corresponding  to  the  subdivision  of  the  treat 
ment  sum  of  squares. 


ANALYSIS  or  VARIANCE 


Source  of 
Variation 

Degrees 
of  Free 
dom 

Mean 

Square 

Replicates  
A  

3 
1 

192 
100 

B  

1 

2500 

AB  

1 

900 

Experimental 
error  

9 

32 

TABLE 


<Z0 

#1 

Sum 

&o    - 

120 

80 

200 

bl  

160 

240 

400 

Sum 

280 

320 

600 

(1)  Interpret  the  effects  of  a  and  6  assuming  both  are  fixed  variates. 

(2)  Compute  and  interpret  the  95  per  cent  confidence  interval  esti 
mate    of    the    true    mean    difference    between    treatments 
and 
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12.35      The  following  yields  of  grass  -were  reported  for  one  year  in  dry  matter 
per  1/57-acre  plots.  This  was  a  randomized  complete  block  design. 


El 
(Ha 

ephant  Gr£ 
rvests  per  y 

LSS 

ear) 

Gu 
(Ha 

ate  mala  Gi 
rvests  per  y 

•ass 
ear) 

Blocks 

2 

3 

4 

2 

3 

4 

1  

109 

222 

187 

277 

246 

252 

2  

97 

125 

163 

293 

263 

181 

3 

133 

134 

143 

260 

194 

224 

4 

113 

173 

179 

325 

190 

248 

Discuss  the  complete  2X3  factorial  experiment,  displaying  the  perti 
nent  estimates;  outline  tentative  conclusions  before  making  the 
analysis  of  variance  and  tests  of  hypotheses. 

12.36  Analyze  and  interpret  the  following  set  of  experimental  data:  crop — 
oats;  location — Flathus,  Correctionville;  year — 1944;  comment — 
yield  in  bushels  per  acre. 


Replicate 

'  1  *r^c*  "f~TT"l  f-T~l  "f" 

Treatment 

1 

2 

3 

Total 

/yt\fp\k\               .  . 

32.2 

33.9 

34.6 

100.7 

7^27?l™l             .... 

37.4 

40.9 

38.9 

117.2 

J^1??2&1              .... 

30.6 

39.4 

33.8 

103.8 

fl%fp%k\  ...        ... 

52.4 

48.0 

43.9 

144.3 

>¥l\'P\k%         

29.9 

34.5 

36.5 

100.9 

n%'f)~\mfe%     

42.2 

29.9 

34.1 

106.2 

^^l7^2^2 

31.8 

32.5 

34.2 

98.5 

46.6 

49.5 

46.7 

142.8 

Total 

303.1 

308.6 

302.7 

914.4 

12.37  An  experiment  was  conducted  to  assess  the  effects  of  3  raw  material 
sources  (i.e.,  suppliers)  and  4  mixtures  (i.e.,  compositions)  on  the 
crushing  strength  of  concrete  blocks.  Twenty-four  blocks  were  se 
lected,  2  at  random  from  those  manufactured  by  each  of  the  12 
treatments,  and  the  experiment  was  conducted  as  a  randomized  com 
plete  block  with  2  replicates.  The  resulting  data  are  given  below. 
Analyze  and  interpret. 
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Suppliers 

Replicate 

Mixtures 

^1 

B 

C 

D 

1 

1 
2 

57 
46 

65 
73 

93 
92 

102 
108 

2 

1 
2 

26 
38 

44 
67 

81 
90 

96 
99 

3 

1 
2 

39 
40 

57 
60 

96 
100 

105 
116 

12.38  The  following  is  a  randomized  complete  block  design  with  two  missing 
plots.  Fill  in  estimates  for  the  missing  values,  and  complete  the 
analysis  of  the  data. 


Trea 

tment 

Block 

1 

2 

3 

4 

Block  Totals 

1  

43 

35 

37 

42 

157 

2  

45 

39 

40 

47 

171 

3.        ... 

42 

30 

M" 

43 

115  +M" 

4  

Mf 

43 

48 

49 

140+-34T 

5  

41 

34 

36 

44 

155 

Treatment 
Totals  

1714-M' 

181 

161+lf" 

225 

73S+M'+M" 
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CH  APTE  R    13 

OTHER  DESIGNS 

THE  COMPLETELY  RANDOMIZED  and  randomized  complete  block  designs 
discussed  in  Chapters  11  and  12,  respectively,  are  only  two  of  the 
many  useful  statistical  designs  that  have  been  developed  for  special 
situations.  Unfortunately,  not  all  of  the  available  designs  can  be  dis 
cussed  in  this  book.  However,  after  considering  such  factors  as  fre 
quency  of  use  and  potential  contribution  to  more  efficient  experimen 
tation,  a  select  group  of  designs  and  analysis  techniques  has  been 
chosen  for  presentation  in  this  chapter.  Persons  desiring  information 
on  other  designs  should  consult  a  professional  statistician  and/or  refer 
to  the  references  at  the  end  of  this  chapter. 

13.1      LATIN   AND   GRAECO-LATIN   SQUARES 

The  Latin  square  (LS}  design  is  frequently  used  in  agricultural  and 
industrial  experimentation.  It  is  a  special  design  that  permits  the  re 
searcher  to  assess  the  relative  effects  of  various  treatments  when  a 
double  type  of  blocking  restriction  is  imposed  on  the  experimental 
units.  Viewed  in  this  way,  the  Latin  square  design  is  a  logical  extension 
of  the  randomized  complete  block  design  and  two  examples  should  be 
sufficient  to  illustrate  the  ideas  involved. 

Example  13.1 

Suppose  we  have  5  fertilizer  treatments  to  be  investigated  and  25 
plots  available  for  experimentation.  If  the  soil  shows  a  fertility  trend 
in  two  directions  (say  N— >S  and  E— >W),  it  would  seem  reasonable  to 
set  up  blocks  of  (5)  plots  in  bath  directions.  This  is  precisely  what  is 
done  under  the  names  rows  and  columns.  The  treatments  are  then 
applied  at  random,  subject  to  the  restriction  that  each  treatment  appear 
but  once  in  each  row  and  each  column. 

Example  13.2 

Consider  the  problem  of  testing  4  machines  to  see  if  they  differ  sig 
nificantly  in  their  ability  to  produce  a  certain  manufactured  part.  It  is 
well  known  that  different  operators  and  different  time  periods  in  the 
work  day  will  have  an  effect  on  production.  Thus,  we  set  up  4  operators 
as  "columns"  and  4  time  periods  as  "rows"  and  then  assign,  at  random, 
the  machines  to  the  various  cells  in  the  square,  subject  to  the  restriction 
that  each  machine  be  used  only  once  by  each  operator  and  in  each  time 
period. 

These  two  examples  should  acquaint  the  reader  with  the  basic  concepts 
involved  in  a  Latin  square  design.  The  idea  of  a  square  is  evident,  of 
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course,  since,  if  m,  treatments  are  to  be  investigated,  we  need  mz  ex 
perimental  units, 

The  basic  assumption  for  a  Latin  square  design  with  one  observation 
per  experimental  unit  is  that  the  observations  may  be  represented  by 
the  linear  statistical  model 

yV/Gfc)  —  M  +  pi  +  ry  +  Tk  +  e;y(fc) ;  i  =   1,   -  •   •  ,  nt  (13 .  1) 

£   —    1      ...       w 

K    —     13  ,    77Z, 

where 

vn.  tn  m 

-A  =  0 


and  the  €#<*>  are  independently  and  normally  distributed  with  mean  0 
and  common  variance  <r2.  The  subscript  k  is  placed  in  parentheses  to 
indicate  that  it  is  not  independent  of  i  and  j.  The  constants  pt-,  y^,  and 
Tk  are,  of  course,  the  true  effects  associated  with  the  ith  row,  Jth 
column,  and  fcth  treatment,  respectively. 

Because  of  the  possible  economies  due  to  reduced  sample  sizes,  the 
Latin  square  design  has  great  appeal  to  researchers  in  all  fields.  In  par 
ticular,  the  engineer  has  been  a  prolific  user  of  the  Latin  square  design, 
but,  unfortunately,  he  has  not  always  used  the  design  "wisely.  An 
examination  of  the  postulated  statistical  model  will  show  that  the 
interactions  among  rows,  columns,  and  treatments  have  been  assumed 
to  be  0.  In  many  engineering  or  industrial  experiments  involving  a 
Latin  square  design  (where  the  rows  and  columns  usually  refer  to 
real  chemical,  physical  or  other  factors) ,  it  is  precisely  this  assumption 
that  appears  to  have  been  overlooked  by  the  researcher.  (NOTE: 
When  information  about  interactions  is  lacking  or  when  the  assumption 
of  0  interaction  is  of  doubtful  validity,  a  full  factorial  should  be  run.) 

Having  pointed  out  the  advantages  and  limitations  of  a  Latin  square 
design,  let  us  now  summarize  the  appropriate  calculations.  These  are: 

=  total  sum  of  squares 

771  7M.  fft,  7n  tn  T^fL  _ 

-2       _  y^  y^  Tr2       _  y- 

/-  -^  .{   -*   -*-  iy  (&)  s  > 


Myy  =  sum  of  squares  due  to  the  mean 

(13  .  3j 


Ryy  =  row  sum  of  squares 

2  (13.4) 


Cyy  =  column  sum  of  squares 

2  (13.5) 
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yy  =  treatment  sum  of  squares 


and 


=  experimental  error  sum  of  squares 

*—  ~        X    y   -«^  """""      JML  yy     ~~~~     JK-yy  \-s  yy     ™~~       JL   yy 


(13.6) 


(13.7) 


where  Jfg,-,  Cj,  and  Tk  represent  the  indicated  row,  column,  and  treat 
ment  totals,  and  T  denotes  the  total  of  all  the  observations.  The  result 
ing  ANOVA  is  shown  in  Table  13.1. 

TABLE  13.1-Generalized  ANOVA  for  an  mXm  Latin  Square 
Design  With.  One  Observation  per  Experimental  Unit 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected 
Mean  Square 

F-Ratio 

Mean  

1 

MM 

M 

Rows 

m  —  I 

JR 

02  _j_  tm/(m  —  l}]  y^  P!- 

0!nlvimT»R  .    .    .    ,        -.,-,, 

m—1 

c 

i—  i 
<r*+\m/(in  —  l')]  £T/ 

Treatments  

T 

T 

y-i 
o^+Iw/Cw  —  1)]  2  £ 

T/E 

Experimental  error. 

(m—l)(m  —  2} 

E 

i—  i 

<r2 

Total 

m* 

TF* 

Example  13.3 

The  data  shown  in  Table  13.2  resulted  from  an  experiment  such  as 
described  in  Example  13.2.  Assuming  that  time  periods,  operators,  and 
machines  do  not  interact  (either  pairwise  or  as  a  complete  set),  the 
ANOVA  of  Table  13.3  is  obtained.  This  leads  to  the  conclusion  that 
there  are  significant  differences  among  the  outputs  of  the  4  machines. 
Further  examination  of  the  data  should  permit  selection  of  the  most 
productive  machine  or  machines. 

Should  a  single  observation  be  missing  in  an  experiment  conducted 
according  to  an  mXm  Latin  square  design,  its  value  may  be  estimated 
using 


M  = 


m(R 


T}  — 


(m  —  1)  (m  —  2) 


(13.8) 
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TABLE  13.2-Number  of  Units  Produced  by  Four  Machines 
in  a  Latin  Square  Design 

(The  random  assignment  of  the  machines  is  shown 
by  the  letters  in  parentheses) 


Time 
Periods 

Operators 

1 

2 

3 

4 

1  

31  (Q 
39  (£>) 
57  (£) 
85  {A) 

43  (Z>) 
96  (A) 
33  (C) 
46  (£) 

67  01) 
40  (J5) 
40  (£>) 
48  (C) 

36  (J5) 
48  (C) 
84  (A} 
50  (Z>) 

2. 

3 

4  

TABLE  13. 3- Abbreviated  ANOVA  for  Data  of  Table  13.2 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean 
Square 

Time  periods   ... 

3 

408.188 

136.06 

J^        2 

<r2-K4/3)  5Z  Pi 

Operators  

3 

88.688 

29.56 

*—  i 
<T2-f-(4/3)  ]T  T* 

Jv/Cacnines   

3 

4946.688 

1648.90** 

*-i 
<72-f-(4/3)  2^  rl 

Experimental  error  ..... 

6 

515.874 

85.98 

Jt-i 
o-2 

**  Significant  at  <*  =  0.01. 

where 

R  =  sum  of  observations  in  the  same  row  as  the  missing  observation 

C  =  sum  of  observations  in  the  same  column  as  the  missing  obser 
vation 

T  =  sum  of  observations  with  the  same  treatment  as  the  missing 
observation 

>§f==sum  of  all  the  actual  observations. 

After  substituting  the  value  of  M  in  the  table,  the  various  sums  of 
squares  are  calculated  as  indicated  above.  However,  it  must  be  remem 
bered  that  the  treatment  sum  of  squares  so  calculated  (Tvy)  will  be 
biased  upwards,  and  a  correction  must  be  applied  before  we  test  the 
hypothesis  H:rk  =  Q  (/b=  1,  -  •  •  ,  m).  This  correction  is  made  by  com 
puting  a  new  treatment  sum  of  squares  (T7^)  defined  as 


Tf 

JL    <UU 


Tyy 


where 


jr     


[S  —  R  —  C  —  (m  —  1)  T\* 
(m  —  1)20»  -  2)2 


(13.9) 


(13.10) 


414  CHAPTER    13f    OTHER    DESIGNS 

Remember  that  the  degrees  of  freedom  associated  with  experimental 
error  and  total  are  each  reduced  by  one  (in  view  of  the  single  missing 
observation);  that  is,  the  degrees  of  freedom  for  experimental  error 
ar&  now  (m  —  l)(m  —  2)  —  1,  and  the  degrees  of  freedom  for  total  are 
now  m2—  1.  No  example  will  be  given  for  the  above  technique,  but 
one  or  two  of  the  problems  at  the  end  of  this  chapter  will  illustrate  the 
principles  involved. 

By  now  the  reader  should  be  sufficiently  adept  at  the  calculations 
involved  in  analyses  of  variance  so  that  lengthy  discussions  of  such 
topics  as  subsampling,  selected  treatment  comparisons,  factorials, 
analysis  of  response  functions,  estimation  of  components  of  variance, 
and  predictions  of  the  relative  efficiencies  of  various  allocations  of  the 
observations  in  terms  of  experimental  and  sampling  units  would  be  a 
waste  of  time.  Accordingly,  we  will  do  no  more  than  state  that  the 
techniques  introduced  in  Chapters  11  and  12  may  easily  be  extended 
and  adapted  for  use  with  Latin  square  designs.  However,  to  make  cer 
tain  that  the  previously  mentioned  extensions  and  adaptations  are  made 
properly,  a  few  problems  requiring  their  use  have  been  included  in  the 
set  at  the  end  of  this  chapter. 

Before  terminating  our  discussion  of  the  Latin  square  design,  mention 
must  be  made  of  its  efficiency  relative  to  completely  randomized  and 
randomized  complete  block  designs.  (NOTE:  This  discussion  will,  of 
course,  be  closely  related  to  that  of  Section  12.7.)  If  we  designate  the 
mean  squares  in  the  Latin  square  for  rows,  columns,  and  experimental 
error  by  -R,  C,  and  E,  respectively,  we  may  readily  evaluate  the  effi 
ciency  of  a  Latin  square  design  relative  to  either  a  completely  ran 
domized  or  randomized  complete  block  design.  For  the  efficiency  of  a 
Latin  square  design  relative  to  a  completely  randomized  design,  we 
calculate 

R  +  C  +  (m  —  1)JE 
R.E.  =  -  -  -  —  -  (13.11) 


If,  however,  we  wish  to  compare  a  Latin  square  design  with  what 
might  have  happened  had  a  randomized  complete  block  design  been 
utilized  (assuming  the  rows  were  used  as  blocks),  the  following  formula 
is  appropriate: 

C  +  (m  —  1)JS 

R.E.  =  -  -  -  —  -  (13.12) 

mE 

If  columns  were  used  as  the  blocks,  we  put  R  in  place  of  C  in  Equation 
(13.12). 

The  concept  of  a  Latin  square  design  can  be  extended  rather  easily 
to  that  of  a  Graeco-Latin  square  (G~LS}  design.  Rather  than  go  into  the 
details  of  a  Graeco-Latin  square,  we  shall  only  indicate,  by  example, 
the  nature  of  the  design.  Those  persons  interested  in  using  such  a  de 
sign  are  advised  to  consult  a  professional  statistician. 
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Example  13.4 

Chew  (21)  describes  an  experiment  which  could  be  used  to  compare 
five  formulations  (<x,  /3,  y,  d,  e)  for  making  concrete  bricks,  using  material 
from  5  batches,  prepared  on  each  of  5  days,  and  tested  on  5  different 
machines  (A,  B}  C,  D,  N).  One  possible  randomization  of  a  Graeco- 
Latin  square  design  for  this  situation  is  shown  in  Table  13.4,  It  will  be 
noted  that:  (1)  each  Latin  letter  appears  exactly  once  in  each  row  and 
each  column,  (2)  each  Greek  letter  appears  exactly  once  in  each  row 
and  each  column,  and  (3)  each  Latin  letter  appears  exactly  once  with 
each  Greek  letter, 

TABLE  13.4-Symbolic  Representation  of  the  Graeco-Latin  Square  Used 

in  Example  13.4 


"O  f\w«5i 

Columns  (Days) 

(Batches) 

1 

2 

3 

4 

5 

1.  .                .  -   . 

Ace 

By 

Ce 

D8 

JES 

2    . 

BQ 

C5 

Da. 

Ey 

Ae 

3  

Cy 

-De] 

E8 

Ad 

BOL 

4  

D5 

^  JL 

Ea. 

Ay 

Be 

C8 

5.  ... 

JSe 

A3 

Bd 

COL 

Dy 

As  tempting  as  Graeco-Latin  squares  are  to  the  industrial  experi 
menter  (because  of  the  potential  savings  in  numbers  of  observations) , 
they  should  be  used  with  caution.  This  recommendation  sterns  from  the 
same  type  of  limitation  that  was  emphasized  for  Latin  square  designs, 
namely,  no  interactions  are  tolerated. 

13.2     SPLIT   PLOTS 

A  fairly  simple  design  which,  is  frequently  used  in  experimental  work 
is  the  split  plot  OSP)  design.  In  this  design  we  are  concerned  with  two 
factors,  but  we  wish  more  precise  information  on  one  of  them  than  on 
the  other.  Let  us  assume  that  we  have  factors  a  and  b  and  desire  more 
accurate  information  on  b  than  on  a.  The  usual  scheme  is  to  assign  the 
various  levels  of  factor  a  at  random  to  the  whole  plots  (main  plots)  in 
each  replicate  as  in  a  randomized  complete  block  design.  Following 
this,  the  levels  of  b  are  assigned  at  random  to  the  split  plots  (sub-plots) 
within  each  whole  plot.  Under  such  a  scheme  of  randomization,  which 
may  arise  not  only  from  the  desire  for  more  precise  information  on  one 
factor  than  on  another  but  also  because  of  the  nature  of  the  factors  and 
the  way  in  which  they  must  be  applied  to  the  experimental  units,  the 
analysis  of  variance  appears  as  in  Table  13.5. 

Example  13.5 

An  experiment  similar  to  that  described  in  Example  10.19  was  per 
formed.  However,  in  this  case,  there  were  six  replicates,  three  tem 
peratures,  and  four  levels  of  electrolyte.  (NOTE:  In  contrast  to  Ex- 


416  CHAPTER    13,    OTHER    DESIGNS 

TABLE  13.5-Generalized  ANOVA  for  a  Split  Plot  Design 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Expected  Mean 
Square 

F-  Ratio 

1 

r—  1 
a—  1 

(r-l)O-l) 

£-1 

O_  !)(&-!) 

(r_D  *Ca_D 

Myv 
Rvv 

Ayy 

CEOw 

J?VJ/ 

C^B)W 

CEOiw 

M 

R 
A 

Ea 

B 
AB 

Eb 

Whole  plots 
Replicates. 
A  

o-i+bo-z+rb  y^  ct.i/(a,—  1) 
a—  i 

2    ,          2 
CT1-J-0CT2 

<r!+rai;/sJ/(6-l) 

A—I 

°-?+r  z;  i  (<*/?)**/(«-  1)  (&-1) 

j_i  fc—  i 

2 
CTl 

^4/^a 

Whole  plot 
error.  .  .  . 
Split  plots 

B  

B/Eb 
AB/Eb 

AB 

Split  plot 
error  .... 

Total 

rob 

]£F* 

ample  10.19,  heat  paper  was  not  a  factor  in  this  experiment.)  The  data 
are  given  in  Table  13.6  and  the  resulting  ANOVA  in  Table  13.7.  No 
calculational  details  are  reported,  since  these  are  assumed  to  be  straight 
forward.  Further  interpretation  of  the  data  is  impossible  because  of  lack 
of  information  regarding  the  exact  nature  of  the  treatments. 

TABLE  13.6— Activated  Lives  (in  Hours)  of  72  Thermal  Batteries  Tested 

in  a  Split  Plot  Design  Which  Used  Temperatures  as  Whole  Plots 

and  Electrolytes  as  Split  Plots 


Replic 

ate 

Temperature 

Electrolyte 

I 

2 

3 

4 

5 

6 

Low        

A 

2.17 

1.88 

1.62 

2.34 

1.58 

1.66 

M!  edium  

B 
C 
D 

A 

1.58 
2.29 
2.23 

2.33 

1.26 
1.60 
2.01 

2.01 

1.22 
1.67 
1.82 

1.70 

1.59 
1.91 
2.10 

1.78 

1.25 
1.39 
1.66 

1.42 

0.94 
1.12 
1.10 

1.35 

High  

B 
C 
D 

A 

1.38 
1.86 
2.27 

1.75 

1.30 
1.70 
1.81 

1.95 

1.85 
1.81 
2.01 

2.13 

1.09 
1.54 
1.40 

1.78 

1.13 
1.67 
1.31 

1.31 

1.06 
0.88 
1.06 

1.30 

B 
C 
D 

1.52 
1.55 
1.56 

1.47 
1.61 
1.72 

1.80 
1.82 
1.99 

1.37 
1.56 
1.55 

1.01 
1.23 
1.51 

1.31 
1.13 
1.33 
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TABLE   13.7-Abbreviated  ANOVA  for  Data  of  Table  13.6 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean  Square 

Whole  plots 
Replicates  

5 

4   1499 

0  .  8300 

Temperatures  .  . 

2 

0    1781 

O  0890 

\Vlaole  plot  error  

10 

1  .  3622 

0.1362 

Split  plots 
Electrolytes    .  . 

3 

1.9625 

0  6542** 

Temperature  X 
electrolyte  

6 

0   2105 

0  0351 

Split  plot  error 

45 

1    2586 

0  0280 

Significant  at  a:  =  0.01. 


Before  leaving  the  subject  of  split  plot  designs,  we  must  take  note 
tliat  the  principle  of  "splitting"  may  be  carried  on  for  several  stages; 
that  is,  we  may  employ  split-split  plot  designs,  etc.  For  more  detailed 
discussion  of  such  designs  and  for  some  illustrative  examples,  one 
should  consult  the  references  at  the  end  of  the  chapter. 


13.3      COMPLETE  FACTORIALS  WITHOUT   REPLICATION, 
FRACTIONAL   FACTORIALS,   AND    INCOMPLETE 
BLOCKS 

Because  most  experimenters  are  interested  in  investigating  the  effects 
on  a  response  variable  of  the  simultaneous  variation  of  many  factors, 
a  large  number  of  designs  incorporate  factorial  treatment  combinations. 
However,  as  the  number  of  factors  increases,  the  size  of  the  experiment 
becomes  prohibitive.  In  addition,  it  becomes  difficult  to  control  the 
magnitude  of  the  experimental  error  within  reasonable  bounds. 

In  an  attempt  to  reduce  the  experimental  error  to  a  reasonable  mag 
nitude,  the  principle  of  confounding  (see  Chapter  10)  was  utilized  to 
create  a  group  of  designs  known  as  incomplete  block  designs.  These 
designs  are  so  named  because  not  all  the  treatment  combinations  are 
present  in  each  block,  that  is,  the  blocks  are  incomplete.  With  ade 
quate  replication,  these  designs  proved  very  useful  in  agricultural 
experimentation . 

Since  incomplete  block  designs  are  not  usually  included  in  a  first 
course  in  statistical  methods,  the  decision  has  been  made  to  omit  dis 
cussion  of  them  from  this  book.  However,  several  of  the  references 
listed  at  the  end  of  the  chapter  discuss  at  length  the  methods  of 
analysis  appropriate  to  such  designs. 

When  engineers  and  physical  scientists  became  interested  in  statis 
tically  designed,  multi-factor  experiments,  they  decided  that  both 
replicated  complete  factorials  and  incomplete  block  designs  were  un- 
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satisfactory  in  that  they  required  too  many  experimental  units. 
Further,  it  was  evident  that,  as  a  general  rule,  the  experimental 
errors  in  industrial  experiments  were  much  smaller  than  those  en 
countered  in  agricultural  experiments.  Because  of  the  small  experi 
mental  errors,  one  common  approach  has  been  to  avoid  replication 
(i.e.,  subject  only  one  experimental  unit  to  each  treatment  combina 
tion)  and  to  estimate  the  experimental  error  by  pooling  the  mean 
squares  associated  with  the  higher  order  interactions.  (NOTE:  This  is 
equivalent  to  assuming  that  the  true  high  order  interaction  effects 
are  0.)  This  technique,  referred  to  in  the  title  of  this  section  as  complete 
factorials  without  replication,  is,  as  we  have  said,  used  quite  often. 
(NOTE:  Actually,  it  is  a  completely  randomized  design  involving 
factorial  treatment  combinations  and  utilizing  only  one  experimental 
unit  per  treatment  combination.) 

Rather  than  devote  a  lot  of  space  to  the  discussion  of  the  models 
and  assumptions  for  the  many  possible  situations,  let  us  consider  an 
example.  It  is  hoped  that  this  will  prove  sufficient  for  a  reasonable 
understanding  of  the  principles  involved.  For  those  persons  who  wish 
to  consider  the  matter  more  thoroughly,  I  again  recommend  the  refer 
ences  listed  at  the  end  of  the  chapter. 

Example  13.6 

Davies   (28)    considered  a  laboratory  experiment  to  investigate  the 
yield  of  an  isatin  derivative  as  a  function  of  acid  strength   (a),  time 
of  reaction  (b),  amount  of  acid  (c),  and  temperature  of  reaction  (d).  Two 
levels  of  each  factor  were  used,  namely: 
a:  87  per  cent,  93  per  cent 
b:    15  minutes,  30  minutes 
c:    35  ml.,  45  ml. 
d:  60°C.,  70°C. 
The  data  shown  in  Table  13.8  led  to  the  ANOVA  of  Table  13.9. 

TABLE   13.8-Yield  of  Isatin  Derivative 
(g.  per  10  g.  of  base  material) 


Acid 
Strength 
(a) 

Reaction 
Time 
(*) 

Temperature  of  Reaction  (d) 

60±1 

70  ±1 

Amount  of  acid  (e) 

Amount  of  acid  (c) 

35  ml.               45  ml. 

35  ml.               45  ml. 

87 
93 

15  rain. 
30  rain. 

15  min. 
30  min. 

6.08  (1)          6.31  (e) 
6.53  (b)          6,12  (be) 

6.04  (a)          6.09  (ae) 
6.43  (ab)        6.36  (abc) 

6.79  (d)          6.77  (ed) 
6.73  (bd)        6.49  (bed) 

6.68  (ad)        6.38  (aed) 
6.08  (abd)     6.23  (abed) 

Source:  O.  31.  Davies,  (editor),  Design  and  Analysis  of  Industrial  Experiments.  Second 
Edition.  Oliver  and  Boyd,  Edinburgh,  1956,  p.  275,  Table  7.7.  By  permission  of  the  author 
and  the  publishers. 
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TABLE   13. 9- Abbreviated  ANOVA  for  Data  of  Table  13.8 
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Source  of  Variation 

Degrees  of 
Freedom 

Mean  Square 

Main  effects 
A  

i 

0~f  A  /C  "2 

B  

i 

.  14oo 
Onrn  Q 

C  

i 

.UUlo 
Ono  1  1 

D  

i 

-Uzoo 

OOOOQ 

Two-factor  interactions 
AB  

i 

.  ZWo 
Or\r\r\c\ 

AC  

i 

.  uuuu 

Of\f~\A  /£ 

AD  

1 

,UU4fcO 
01  r\A  r\ 

BC  

1 

-  1U4U 
Or\-i  *7j< 

BD  

1 

.U17o 
00  co  c 

CD  

1 

.  ZDZ.S 

Of\r\OQ 

Experimental  error 
(pooled  high  order  interactions)  . 

5 

.  UU^O 

0.0385 

Source:  O.  L.  Davies,  (editor),  Design  and  Analysis  of  Industrial  Experiments,  Second 
Edition,  Oliver  and  Boyd,  Edinburgh,  1956,  p.  277,  Table  7.72.  By  permission  of  the 
author  and  the  publishers. 


As  helpful  as  it  was,  the  approach  taken  in  the  two  preceding  para 
graphs  and  illustrated  in  Example  13.6  (i.e.,  the  utilization  of  a  com 
plete  factorial  without  replication)  was  not  enough.  Experiments  were 
still  too  large  to  suit  the  researcher.  Some  other  way  had  to  be  found 
to  reduce  the  size  and  cost.  One  such  attempt  was  the  development 
of  fractional  factorials  in  which  only  some  (a  fraction)  of  the  treatment 
combinations  are  actually  investigated. 

Once  an  experimenter  has  decided  that  some  form  of  fractional  fac 
torial  is  appropriate  for  his  needs,  the  question  naturally  arises:  "Which 
treatment  combinations  should  be  included  in  the  experiment?"  The 
answer  to  this  question  depends,  of  course,  on  what  assumptions  the 
experimenter  is  willing  to  make  or,  to  phrase  it  differently,  on  what 
information  he  is  willing  to  forego.  As  we  all  know,  you  can  seldom 
get  something  for  nothing,  and  the  desired  smaller  experiment  with 
its  associated  savings  can  only  be  achieved  at  a  cost,  namely,  the 
cost  of  giving  up  part  of  the  information  usually  derived  from  complete 
factorials. 

To  illustrate  the  nature  of  a  fractional  factorial,  let  us  consider  a 
one-half  replicate  of  a  26  factorial.  If  we  have  32  experimental  units  and 
subject  them  to  the  treatment  combinations  shown  in  Table  13.10,  the 
principles  introduced  in  Section  10.14  may  be  invoked  to  show'  the 
equivalences  (i.e.,  confoundings)  of  effects  listed  in  Table  13.11.  If  the 
experimenter  is  willing  to  assume  that  all  interaction  effects  involving 
three  or  more  factors  are  0,  this  fractional  factorial  is  adequate  to 


TABLE  13.10-Treatment  Combinations  To  Be  Used  in  a  One-Half  Replicate 
of  a  26  Factorial  in  Which  the  Defining  Relation  is  I  =  ABCDEF* 


Experimental 
Unit 

Treatment 
Combination 

Experimental 
Unit 

Treatment 
Combination 

1  

(1) 

17 

/7/7 

2  

de 

18 

CLUr 

3  

ef 

19 

&& 
nJ*>-F 

?4  

yj 
df 

20 

(LQr&J 
ft-f 

15  

ab 

21 

aj 

-L.J 

*6  

abde 

22 

ud 

Z,- 

|7  

dbef 

23 

o& 

"kjj*-f 

8  

abdf 

24 

uaej 
t/ 

19  

ac 

25 

PJ 
ffj 

10  

acde 

26 

CQf 

11  

acef 

27 

ce 
ffj^-f 

12  

acdf 

28 

CCL6J 
f-f 

13  

\J,V*MJ 

be 

29 

CJ 

siT-.^J 

14  

bcde 

30 

duCCL 

15  

beef 

31 

cibcc 

16  

bcdf 

32 

dbcdef 

sif^^f 

^^/ 

Q>OCj 

-  The  use  of  the  symbol  I  rather  than  M  (as  in  Chapter  10)  is  to  agree  with  convention 
The  equality  sign  is  used  as  an  abbreviation  for  "is  completely  confounded  with." 


TABLE   13.11-Confounded  Effects  in  a  One-Half 

Replicate  of  a  26  Factorial  in  Which  the 

Defining  Relation  is  I^ABCDEP 


I=ABCDEF 
A=BCDEF 
B  =  ACDEF 
AB^CDEF 

C=ABDEF 
AC=BDEF 
BC=ADEF 
ABC=DEF 


>  =  ACEF 
ABD  =  CEF 

CD=ABEF 
ACD  =  BEF 


ABCD=EF 


E^ABCDF 


BE=ACDF 
A  BE  =  CDF 

CE=ABDF 
ACE=BDF 
BCE=ADF 
ABCE^DF 

DE=ABCF 


BDE^ACF 
ABDE=CF 

CDE=ABF 
ACDE=BF 
BCDE=AF 
ABCDE=*F 
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TABLE  13.12-Abbreviated  A1SFOVA  for  the  Experiment 

of  Table  13.10 


Source  of  Variation 


Mean 

A 

B 

C 

D 

R 

F 

AB 

AC 

AD 

AR 

AF 

BC 

BD 

BE. 

BF 

CD 

CR 

CF 

DR 

DF 

RF 

Experimental  error 

(higher  order  interactions) . 


Total 


Degrees  of 
Freedom 


1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 

10 


32 


estimate  all  main  effects,  all  two-factor  interactions,  and  experimental 
error.  Under  such  an  assumption,  the  appropriate  ANOVA  is  as  given 
in  Table  13.12. 

Fractional  factorials  have  wide  application  in  industrial  experimen 
tation.  Thus,  it  will  pay  research  workers  in  both  engineering  and  the 
physical  sciences  to  become  better  acquainted  with  these  valuable  aids 
to  efficient  experimentation.  As  with  other  topics  mentioned  in  this 
section,  it  is  felt  that  a  detailed  discussion  is  beyond  the  scope  of  this 
book.  For  this  reason,  the  interested  reader  is  referred  to  the  publica 
tions  listed  at  the  end  of  the  chapter. 

13.4      UNEQUAL  BUT  PROPORTIONATE  SUBCLASS 
NUMBERS 

The  reader  will  have  noticed  that  practically  all  the  recommended 
statistical  designs  require  a  balanced  configuration,  that  is,  an  equal 
number  of  observations  in  each  group.  The  one  exception  was  the 
completely  randomized  design.  However,  even  in  that  case,  it  was 


422 


CHAPTER    13,    OTHER    DESIGNS 


noticed  that  unequal  numbers  of  observations  in  the  subgroups  could 
lead  to  difficulties  in  interpretation.  (See  Section  11.4.) 

In  this  section  we  propose  to  examine  one  other  case  of  unequal  fre 
quencies  which  presents  little  difficulty  in  the  way  of  calculation.  This 
case  involves  a  factorial  set  of  treatment  combinations  in  which  the 
cells  of,  say,  the  aX&  table  contain  different  numbers  of  observations 
but  these  numbers  happen  to  be  proportional.  That  is,  the  number  of  ob 
servations  in  the  (ij)th  cell  are  such  that  n^^u^Vj  where  u\\ u*  *  -  •  :ua 
are  the  proportions  in  the  rows  and  #1:^2-  -  •  •  :v&  are  the  proportions 
in  the  columns.  Rather  than  go  into  details,  a  numerical  example  will 
be  given  and  it  is  hoped  that  this  will  be  sufficient  to  illustrate  the 
ideas  involved.  Persons  desiring  further  details  should  consult  the 
references  at  the  end  of  the  chapter. 

Example  13.7 

Suppose  we  have  3  varieties  of  oats  to  be  tested  for  yield  differences  and 
that  we  also  wish  to  investigate  the  effects  of  3  fertilizers.  There  are  28 
experimental  plots  available  to  the  researcher.  Further,  we  will  assume 
that  from  previous  experiments  we  already  know  considerably  more 
about  varieties  B  and  C  than  about  variety  A;  thus,  we  shall  plant 
variety  A  on  twice  as  many  plots  as  varieties  B  and  C.  It  is  also  con 
sidered  desirable  to  assign  the  3  fertilizers  to  the  plots  in  the  ratio 
3:2:2;  that  is,  we  shall  apply  fertilizer  No.  1  to  12  plots  and  each  of 
fertilizers  No.  2  and  No.  3  to  8  plots.  The  assignment  of  the  treat 
ment  combinations  to  the  plots  was  made  completely  at  random, 
and  the  resulting  yields,  in  bushels  per  acre,  are  recorded  in  Table  13.13. 
Calculating  the  various  sums  of  squares  in  the  usual  manner,  we  arrive 
at  the  abbreviated  ANOVA  of  Table  13.14. 

TABLE   13. 13- Yields  of  3  Varieties  of  Oats  Subjected  to  3  Different 

Fertilizer  Treatments 
(In  bushels  per  acre) 


c\<*  + 

Fertilizer 

vjat 
Variety 

1 

2 

3 

A    

50,  51,  52,  56,  60   55 

42,  40   38   38 

55   56   56 

58 

B  

65,  69,  67 

50,  50 

62,  62 

C 

67,  67,  69 

48   50 

65,  67 

TABLE   13. 14- Abbreviated  ANOVA  for  Data  of  Table  13.13 


Source  of  Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 
Square 

Treatments 
Varieties  

2 

818.9 

409   45 

Fertilizers    . 

2 

1455   0 

727    50 

Varieties  X  fertilizers  

4 

52.3 

13   075 

Among  plots  treated  alike 

19 

100   5 

5   289 
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13.5      UNEQUAL  AND    DISPROPORTIONATE   SUBCLASS 
NUMBERS 

Let  us  now  examine  the  case  where  our  data  may  be  represented  by 
tlie  model 

Y*Sh  =  /*  +  «<  +  fo  +  («£)*  +  €„•*  i  =   1,   •   •   -  ,  a          (13  .  13) 

J  =   1,  •   •   '  ,  6 


where  the  various  terras  are  defined  as  before  but  the  n^  are  not  equal 
for  the  various  cells  of  the  aX&  table.  Further,  the  n^  are  not  propor 
tionate  as  they  were  in  Section  13.4.  What  difficulties  in  analysis  result 
from  this  fact?  Why  is  it  that  we  refer  to  the  case  of  "unequal  and  dis 
proportionate  subclass  numbers"  as  an  undesirable  situation?  The 
answer  is,  of  course,  because  we  encounter  complications  in  analyzing 
such  data.  Let  us  now  take  note  of  some  of  the  problems  that  arise. 

Suppose  that  we  went  ahead,  ignoring  the  fact  that  the  subclass 
numbers  are  disproportionate,  and  calculated  the  various  sums  of 
squares  in  the  usual  fashion.  If  this  procedure  were  followed,  we 
would  find  that  the  sums  of  squares  so  calculated  (assuming  that  each 
sum  of  squares  was  calculated  directly;  that  is,  no  sum  of  squares  was 
obtained  by  subtraction)  would  not  sum  up  to  agree  with  the  total 
sum  of  squares.  In  other  words,  because  of  the  disproportionality  of  the 
subclass  numbers,  the  different  comparisons  with  which  the  sums  of 
squares  are  associated  are  -nonorthogonal.  This,  of  course,  would  lead  to 
biased  test  procedures  unless  some  adjustment  were  made.  The  other 
major  difficulty  which  arises  when  dealing  with  cases  involving  dis 
proportionate  subclass  numbers  is  that  the  simple  (unweighted)  treat 
ment  means  obtained  from  the  data  are  biased  estimates  of  the  true 
treatment  effects.  This  could  lead  to  serious  errors  if  inferences  were 
made  without  attempting  to  correct  for  the  above-mentioned  bias. 

What  then,  should  be  the  method  of  analysis  for  such  situations? 
The  usual  approach  is  to  utilize  regression  techniques  and  obtain  a 
general  least  squares  solution.  However,  because  of  the  many  varia 
tions  which  may  be  employed  (e.g.,  different  models  and/or  different 
orders  of  estimating  the  unknown  parameters),  neither  detailed  ex 
planations  nor  numerical  illustrations  of  such  solutions  will  be  included 
in  this  book.  If  you  should  encounter  a  situation  in  which  a  general 
least  squares  solution  is  required,  I  would  suggest  that  you  do  three 
things:  (1)  review  the  contents  of  Chapter  8;  (2)  study  the  appropriate 
sections  of  some  of  the  references  at  the  end  of  this  chapter;  and  (3) 
consult  a  professional  statistician. 
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13.6      RESPONSE  SURFACE  TECHNIQUES 

One  of  tlie  most  significant  contributions  to  statistical  methodology 
in  recent  years  has  been  the  development  of  systematic  procedures 
for  determining,  experimentally,  those  levels  of  the  factors  under  inves 
tigation  which  produce  an  optimum  response.  These  procedures,  fre 
quently  referred  to  as  response  surface  techniques,  can  be  of  value  to 
researchers  in  almost  every  field  of  specialization.  Unfortunately,  a 
satisfactory  description  of  the  many  ramifications  of  these  techniques 
is  more  than  can  be  accomplished  in  this  text.  Therefore,  we  shall  be 
content  with  a  few  general  observations  on  the  topic  and  then  refer  the 
reader  to  other  sources  where  these  ideas  are  discussed  in  greater  detail. 

Response  surface  techniques  are,  in  essence,  a  blending  of  regression 
analysis  (Chapters)  and  experimental  design  (Chapters  10,  11,  12,  and 
13)  to  provide  an  economical  means  of  locating  a  set  of  experimental 
conditions  (i.e.,  a  combination  of  factor  levels)  which  will  yield  a 
maximum  (or  minimum)  response.  However,  one  very  important  fea 
ture  has  been  added.  That  feature  is  the  sequential  nature  of  the  explora 
tion  of  the  response  surface.  While  it  is  true  that  most  research  is  of  the 
continuing  variety  (and  therefore  sequential),  the  majority  of  the 
techniques  discussed  heretofore  in  this  book  have  been  of  the  nonse 
quential  type.  Thus,  the  insertion  of  the  sequential  element  into  the 
pattern  of  the  investigation  is,  from  one  point  of  view,  a  long  overdue 
step. 

In  capsule  form,  the  steps  involved  in  the  application  of  response 
surface  techniques  are  as  follows : 

(1)  Choose  base  levels  of  the  factors  to  be  investigated.  (Depend 
ing  on  the  judgment  of  the  experimenter,  these  levels  may  be 
close  to  or  far  removed  from  the  optimum  levels.) 

(2)  Since,  at  this  stage,  linear  effects  are  thought  to  be  dominant 
over  nonlinear  effects,  select  one  other  level  of  each  factor. 

(3)  Utilizing  either  a  complete  or  fractional  2n  factorial,  estimate 
(by  examining  the  effects;  i.e.,  the  linear  regression  coefficients) 
the  direction  in  which  the  greatest  gain  may  be  expected. 

(4)  Moving  in  this  direction,  that  is,  along  the  path  of  steepest 
ascent,  to  the  extent  that  the  experimenter  deems  reasonable, 
a  second  experiment  (again  utilizing  a  complete  or  fractional 
2n  factorial)  is  performed. 

(5)  Repeat   steps   (3)   and   (4)   until  a  near-stationary  region  is 
found. 

(6)  Then,  utilizing  a  complete  or  fractional  3n  factorial  or  a  com 
posite  design1  to  estimate  the  second  order  effects,  the  nature 
of  the  response  surface  may  be  explored  in  the  near-stationary 
region  and  the  optimum  conditions  located. 

It  will  be  realized  that  the  preceding  steps  are  only  an  indication  of 

1  A  composite  design  is  essentially  a  complete  2n  factorial  with  sufficient  points 
added  to  permit  estimation  of  the  second  order  effects. 
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the  procedure.  Depending  on  the  problem  and  the  assumptions  that 
the  experimenter  is  willing  to  make,  the  "rules"  may  be  modified.  (For 
example,  3n  factorials  or  composite  designs  might  be  used  in  step  No.  3.) 
However,  regardless  of  the  details,  the  philosophy  of  sequential  experi 
mentation  has  much  to  recommend  it.  In  fact,  the  concept  of  exploring 
a  response  surface  in  a  sequential  manner  with  the  objective  of  locating 
a  maximum  (or  minimum)  point  on  the  surface  is  one  with  which  all 
experimenters  should  be  familiar.  For  those  who  are  interested  in 
pursuing  this  topic  further,  an  excellent  exposition  is  available  in 
Davies  (28).  Both  the  theory  and  the  application  of  response  surface 
techniques  are  also  discussed  in  a  number  of  the  other  references  listed 
at  the  end  of  this  chapter. 

13.7  RANDOM    BALANCE 

Another  recent  contribution  to  experimental  design  is  the  concept  of 
random  balance  investigations  in  multi-multi-factor  experiments. 
As  proposed  by  Satterthwaite  (43),  the  random  balance  technique 
permits  the  researcher  to  screen  a  large  number  of  possible  contributing 
factors  in  an  experiment  involving  a  limited  number  of  test  runs.  That 
is,  random  balance  is  a  device  for  considering  (simultaneously)  the  many 
factors  involved  and,  as  is  always  important,  keeping  the  size  of  the 
experiment  within  reasonable  bounds.  When  the  experiment  has  been 
performed,  examination  of  the  results  should  permit  isolation  of  the 
more  important  factors  for  further  investigation. 

In  a  random  balance  experiment,  all  factors  and  levels  are  consid 
ered  by  choosing  at  random  the  level  of  each  factor  to  be  used  in  forming 
a  particular  treatment  combination.  (NOTE:  From  a  practical  point  of 
view,  the  following  restriction  on  complete  randomization  has  been 
found  desirable:  Each  level  of  a  particular  factor  should  be  used  an 
equal,  or  nearly  equal,  number  of  times.)  Since  random  balance  experi 
mentation  was  first  proposed,  there  has  been  much  discussion,  both  pro 
and  con,  as  to  its  worth.  Personally,  I  believe  that  random  balance  has 
much  to  recommend  it  and  that  we  will  see  a  rapid  increase  in  its  use, 
especially  in  industrial  experimentation.  However,  the  theory  on  which 
it  is  based  has  not  been  fully  explored,  and  thus  the  controversy  over 
its  merits  continues.  For  those  interested  in  the  possibilities  and/or 
wisdom  of  using  random  balance  in  their  own  experimentation,  I  sug 
gest  a  careful  reading  of  the  appropriate  references  at  the  end  of  this 
chapter. 

13.8  OTHER   DESIGNS  AND  TECHNIQUES 

As  was  stated  in  the  opening  paragraph  of  this  chapter,  the  number 
of  designs  and  analysis  techniques  that  have  been  developed  for  special 
purposes  are  many.  Thus,  it  has  been  possible  to  mention  only  a  few 
in  this  book.  The  two  most  common  designs,  the  completely  random 
ized  design  and  the  randomized  complete  block  design,  were  discussed 
in  detail  in  Chapters  11  and  12,  respectively.  In  this  chapter  a  few  of 
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the  more  specialized  designs  and  techniques  have  been  described  or 
alluded  to.  An  examination  of  some  of  the  references  which  follow  will 
bring  many  other  special  designs  to  your  attention.  It  is  my  hope  that 
the  presentation  thus  far,  brief  through  it  has  sometimes  been,  will 
have  whetted  your  appetite  and  that  you  will  continue  your  readings 
and  studies  in  the  related  areas  of  experimental  design  and  research 
techniques. 

Problems 

13.1        A  5X5  Latin  square  was  laid  out  to  test  the  effects  of  5  fertilizers  on 
the  yield  of  potatoes.  Perform  a  complete  analysis  of  the  data. 


Column 

"O  i-v-vjr-r 

Row 

1 

2 

3 

4 

5 

Totals 

1  

A  449 

B  444 

C  401 

Z>299 

£292 

1885 

2.. 

B  463 

C  375 

Z>323 

£264 

A  415 

1840 

3  

C  393 

Z?353 

£278 

A  404 

^425 

1853 

4  

D371 

E  241 

A  441 

B  410 

C  392 

1855 

5   . 

£  258 

A  430 

B  450 

C  385 

jD347 

1870 

Column 
totals 

1934 

1843 

1893 

1762 

1871 

9303 

Treatment  totals 


A:  2139 

B:  2192 

C:  1946 

D:  1693 

E:  1333 

13.2        Shown  below  are  the  yields  (cwt.  per  1/40-acre  plots)  of  sugar  cane 
in  a  Latin  square  experiment  comparing  fertilizers. 


A  14 
B  19 
D23 
C  21 

£23 


E22 
D21 
A  15 
#46 
C  16 


£20 
A  16 
C20 
£24 


C  18 
£23 
B  18 
£>21 
A  17 


D25 
C  18 
£23 
A  18 
B  19 


A :  No  fertilizer 

B :  Complete  inorganic  fertilizer 
C:   10  tons  manure  per  acre 
£>:  20  tons  manure  per  acre 
£:  30  tons  manure  per  acre 

What  conclusions  do  you  draw  from  this  experiment? 

13.3        Analyze  the  following  data  from  a  cacao  experiment  consisting  of  3 
separately  randomized  Latin  squares.  The  3  treatments  were: 

A :  No  fertilizer  (check) 

B:    1.5  Ibs.  superphosphate  per  tree 

C:   3  Ibs.  superphosphate  per  tree 
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The  field  plans  of  the  squares,  together  with  plot  yields  in  average 
pods  per  tree,  are  as  follows: 


B 

C 

A 

41 

25 

15 

A 

B 

C 

20 

32 

24 

C 

^ 

^ 

22 

12 

21 

C 

B 

.4 

27 

28 

3 

^ 

C 

B 

4 

17 

9 

B 

A 

C 

22 

4 

17 

A 

C 

B 

11 

15 

17 

B 

A 

C 

24 

14 

33 

C 

B 

A 

22 

20 

15 

Note:  Do  not  consider  a  transformation  because  these  are  averages 
(rounded)  from  the  trees  on  1/15-acre  plots.  The  total  numbers  of 
pods  were  large  enough  to  approximate  a  continuous  distribution. 
13.4         Five  levels  of  a  fertilizer  were  tried  in  a  5X5  Latin  square.  This  is 
the  analysis: 

H> agrees  of  Mean 

Freedom  Square 

Rows  4  25 

Columns  4  20 

Treatments  4  28 

Error  12  15 

The  sums  of  the  yields  in  the  5  plots  of  each  level  were: 
Level  12345 

Sum  of  yields  2  14  26  3O  28 

Subdivide  the  4  degrees  of  freedom  for  treatments  into 

d.f. 

Linear  regression  1 

Second  degree  term  1 

Remainder  2 

Is  any  comparison  significant? 
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13.5  Crop — wheat;  Location — R.  W.  Gt.  Harpenden  (175);  Year — 1935; 
Type — 6X6  Latin  square;  Comment — yield  in  pounds  of  grain  per 
1/40-acre  plot 


Total 

4 

0 

2 

1 

3 

5 

77.2 

88.0 

89.7 

92.6 

72.1 

76.2 

495.8 

3 

4 

0 

5 

1 

2 

93.2 

95.8 

94.1 

93.9 

91.6 

67.3 

535.9 

5 

2 

3 

4 

0 

1 

90.2 

87.0 

86.1 

85.5 

93.4 

68.5 

510.7 

2 

3 

1 

0 

5 

4 

72.5 

76.7 

96.3 

95.3 

95.9 

78.2 

514.9 

0 

1 

5 

2 

4 

3 

84.2 

96.5 

98.5 

81.6 

90.1 

81.8 

532.7 

1 

5 

4 

3 

2 

0 

77.0 

91.9 

95.1 

86.3 

82.8 

60.5 

493.6 

Total     494.3 

535.9 

559.8 

535.2 

525.9 

432.5 

3083.6 

Treatment 
Treatments  Total 

0 — No  (NH02SO4 515  .5 

1 — (NH4)2SO4  applied  Oct.  26  at  0.4  cwt.  of  N/A 522.5 

2 — (NH4)2SO4  applied  Jan.  19  at  0.4  cwt.  of  N/A 480.9 

3 — (NHLOaSCX  applied  Mar.  18  at  0.4  cwt.  of  N/A 496.2 

4 — (NELOaSCU  applied  Apr.  27  at  0.4  cwt.  of  N/A 521 .9 

5 — (NHOaSO*  applied  May  24  at  0.4  cwt.  of  N/A 546.6 

Analyze  and  interpret  the  above  data. 

13.6  We  wish  to  conduct  a  field  experiment  to  test  the  yielding  ability  of 
6  varieties  of  soybeans  and  have  available  an  area  of  land  sufficient 
for  36  plots.  Indicate  the  proper  subdivision  of  the  total  degrees  of 
freedom  for  the  following  experimental  designs: 

(a)    completely  randomized 

(6)    randomized  complete  block 

(c)    Latin  square. 

Indicate,  by  means  of  arrows,  the  proper  ^P-tests  for  testing  variety 
differences  in  each  design. 

13.7  Given  that  the  data  shown  below  resulted  from  an  experiment  such 
as  described  in  Example  13.4,  perform  the  analysis  and  give  your 
interpretations  of  the  results. 
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Columns  (Days) 

Rows 
(Batches) 

1 

2 

3 

4 

5 

1 

257 

230 

279 

287 

202 

2 

245 

283 

245 

280 

260 

3 

182 

252 

280 

246 

250 

4 

203 

204 

227 

193 

259 

5 

231 

271 

266 

334 

338 

*  The  treatments  are  assumed  to  have  been  Imposed  exactly  as  shown  In  Example  13.4. 

13.8  An  experiment  was  conducted  to  assess  the  relative  resistances  to 
abrasion  of  four  grades  of  leather  (A,  B,  C,  £>).  A  machine  was  used 
in  which  the  samples  could  be  tested  in  any  one  of  four  positions. 
Since  different  runs  (replications)  are  known  to  yield  variable  results, 
it  was  decided  to  make  four  runs.  A  Latin  square  design  was  utilized 
and  the  following  results  obtained.  Analyze  and  interpret  the  data. 


Position 


Run 

1 

2 

3 

4 

1 
2 
3 
4 

118(5) 
127  (D) 
174(4) 
130(C) 

136(Z>) 
141(3) 
173(C) 
170(4) 

168(4) 
129(C) 
126CB) 
125(Z>) 

135(C) 
151(4) 
134(1?) 
95(5) 

13.9  Another  experiment  such  as  described  in  Problem  13.8  was  con 
ducted  at  a  second  laboratory.  In  this  case,  the  data  shown  below 
were  obtained.  Analyze  and  interpret.  (NOTE:  M  represents  a 
missing  observation.) 


Run 

4 

2 

1 

3 

2 
3 
1 
4 


-4(150) 
2?(130) 


£(98) 


Position 


£(145) 
C(172) 
Z?(132) 
-4(171) 


-4(170) 
£(115) 
C(132) 


C(133) 
£(127) 
.4(170) 
Z?(120) 


13  10  The  experiment  described  in  Problem  13.8  was  conducted  once  more, 
this  time  at  a  third  laboratory.  Analyze  and  interpret  the  data  which 
follow.  [HINT:  Use  Equation  (13.8)  and  the  iterative  technique  dis 
cussed  in  Section  12.13.] 
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Run 

Position 

1 

2 

3 

4 

1 
2 
3 
4 

C(131) 
£>(139) 
B(157) 
,4(185) 

.D(Af') 
-4(196) 
C(133) 
£(146) 

.4(167) 
£(140) 
Z>(140) 
C(M") 

5(136) 
C(148) 
,4(184) 
D(150) 

13.11  On  checking  the  original  data  sheets,  it  was  discovered  that  the 
technician  took  two  independent  abrasion  readings  on  the  samples 
tested  in  the  experiment  described  in  Problem  13.8.  The  second  set 
of  readings  is  reproduced  below.  Pooling  these  data  with  those  given 
in  Problem  13.8,  analyze  and  interpret  the  complete  results. 


Run 

Position 

1 

2 

3 

4 

1 

2 
3 
4 

120(5) 
125(Z>) 
175C4) 
132(C) 

130(Z>) 
142(5) 
180(C) 
17004) 

165  (-4) 
120(C) 
120(5) 
130(1?) 

140(C) 
14004) 
140(1)) 
102(5) 

13.12  An  experiment  was  performed  to  compare  the  effects  of  three 
catalysts  on  the  yield  of  a  chemical  process.  Three  runs  were  started, 
one  using  catalyst  A,  another  using  B,  and  the  third  C.  After  3  days, 
a  sample  was  drawn  from  each  run  and  an  analysis  performed.  A 
similar  operation  (i.e.,  taking  samples  and  performing  the  analyses) 
was  performed  after  5  days.  The  whole  experiment  was  repeated  four 
times.  Analyze  and  interpret  the  resulting  data. 

CODED  YIELDS  OF  AN  UNSPECIFIED  CHEMICAL  PROCESS 


Catalyst 


A 

B 

C 

Replicate 

3  days 

5  days 

3  days 

5  days 

3  days 

5  days 

1 

68 

82 

90 

96 

82 

88 

2 

83 

79 

68 

80 

71 

78 

3 

66 

75 

70 

91 

68 

78 

4 

66 

76 

84 

92 

74 

80 

13.13  A  split-split  plot  design  was  used  in  an  experiment  concerned  with  the 
yield  of  cotton.  Four  replications  (or  blocks)  were  involved.  Each 
main  plot  was  subjected  to  one  of  two  levels  of  irrigation,  each  sub 
plot  was  subjected  to  one  of  three  rates  of  planting,  and  each  sub- 
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subplot  was  subjected  to  one  of  three  levels  of  fertilizer  application. 
Analyze  and  interpret  the  following  experimental  yields. 

CODED  YIELDS  FROM  COTTON  GROWING  EXPERIMENT 


Rate  of 

Block 

Planting 

(Density  of 

T'VrtiliT'^T 

Irrigation 

Plants) 

JL    t*.L  LiiljoCJ. 

Rate 

1 

2 

3 

4 

Light 

Thin 

ISTone 

9.0 

8.2 

8.5 

8.2 

Average 

9.5 

8.1 

8.8 

7.9 

Heavy 

10.6 

9.4 

8.8 

8.6 

Medium 

None 

9.0 

9.7 

11.1 

7.8 

Average 

8.9 

9.7 

10.3 

8.5 

Heavy 

9.3 

10.4 

9.1 

8.6 

Thick 

None 

8.1 

7.4 

8.2 

8.5 

Average 

9.0 

8.1 

7.6 

8.8 

Heavy 

9.6 

7.5 

9.4 

8.4 

Heavy 

Thin 

None 

8.1 

10.3 

6.0 

7.2 

Average 

8.6 

10.8 

10.4 

11.6 

Heavy 

10.2 

10.4 

11.5 

11.6 

Medium 

None 

12.2 

9.8 

9.1 

11.0 

Average 

11.0 

9.5 

11.7 

13.2 

Heavy 

12.0 

12.4 

11.6 

^3.0 

Thick 

None 

7.9 

13.4 

12.0 

11.7 

Average 

10.0 

14.2 

12.2 

13.8 

Heavy 

12.5 

14.0 

13.8 

13.4 

13.14     The  following  data  resulted  from  an  unreplicated  complete  factorial. 
Analyze  and  interpret.  State  all  your  assumptions. 

CODED  YIELDS  OP  A  CHEMICAL  PROCESS 
Concentration  of  Solvent 


JL  dUJJCrfcLLUiC 

Low 

Medium 

High 

100 

44 

46 

42 

200 

51 

55 

55 

300 

50 

50 

48 

13.15 


In  a  manufacturing  company,  the  micrometers  used  in  checking 
quality  are  themselves  checked  by  use  of  gauge  blocks.  However, 
there  are  5  departments  and  each  has  its  own  micrometers  and 
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gauge  blocks.  Because  of  a  suspicion  that  there  is  too  much  variation 
among  micrometers  and/or  gauge  blocks,  the  quality  control  engi 
neer  ran  a  test  utilizing  a  random  sample  of  instruments.  Analyze 
and  interpret  the  following  data. 


Gauge 
Block 

Micrometer 

1 

2 

3 

4 

5 

A 

0.0110 

0.0115 

0.0130 

0.0151 

0.0121 

B 

0.0135 

0.0127 

0.0132 

0.0155 

0.0128 

C 

0.0127 

0.0124 

0.0132 

0.0152 

0.0130 

13.16  An  experiment  was  run  to  investigate  the  effect  of  temperature,  type 
of  powder,  amount  of  powder,  and  packing  pressure  on  the  function 
time  of  an  explosive  actuator.  An  unreplicated  complete  factorial 
yielded  the  following  data.  Analyze  and  interpret. 

FUNCTION  TIME  (IN  MILLISECONDS) 


Type  of  Powder 

A 

B 

C 

Amount  of  Powder  (Mg.) 

5         10       15 

5        10       15 

5        10       15 

Temperature 
(°F.) 

Packing 
Pressure 
(psi) 

-50 

10,000 
15,000 
20,000 

7.4     7.0     6.8 
7.5     7.2     6.7 
7.4     7.4     6.0 

5.4     5.0     4.8 
5.5     5.2     4.7 
5.4     5.4     4.0 

7.2     6.9     6.6 
7.2     6.6     6.5 
7.2     6.7     6.2 

75 

10,000 
15,000 
20,000 

6.6     6.6     5.8 
6.8     6.6     6.6 
6.8     6.2     5.9 

4.6     4.6     3.8 
4.8     4.6     4.6 
4.8     4.2     3.9 

6.8     7.2     4.9 
6.9     7.0     5.0 
7.0     7.1     5.0 

200 

10,000 
15,000 
20,000 

5.1     5.1     5.1 
5.1     4.8     4.9 
5.2     4.7     5.0 

3.1     3.1     3.1 
3.1     2.8     2.9 
3.2     2.7     3.0 

6.0     4.9     4.8 
6.4     4.8     4.1 
5.9     4.9     2.0 

13.17  A  complete  but  unreplicated  factorial  was  used  to  investigate  the 
effects  of  type  of  metal  (a  qualitative  factor),  amount  of  primary 
initiator  (a  quantitative  factor),  and  packing  pressure  (a  quanti 
tative  factor)  on  the  firing  time  of  explosive  switches.  Analyze  and 
interpret  the  following  data: 
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FIRING  TIMES  (IN  MILLISECONDS) 


Metal 

Primary 
Initiator 
(Mg.) 

Packing  Pressure  (psi) 

12,000 

20,000 

28,000 

2  s  al 

5 

12.3 

10.6 

15.2 

10 

10.4 

9.5 

15.0 

15 

8.8 

9.1 

14.5 

teflon 

5 

12,4 

11.7 

15.0 

10 

11.0 

11.0 

14,6 

15 

11.0 

9.8 

14.6 

13.18  A  certain  type  of  capacitor  was  to  be  tested  to  assess  its  perform 
ance  as  a  function  of  a  number  of  specified  factors.  The  four  factors 
considered  were: 

(a)  ^potted  (+)  or  not  potted  (  — ) 
(6)==  wedged  (+)  or  not  wedged  (—) 

(c)  —  impregnated  (+)  or  not  impregnated  (  — ) 

(d)  =  high  temperature  (+)  or  low  temperature  (— ). 

The  performance  characteristic  measured  was  the  high  voltage 
breakdown  when  the  capacitors  were  subjected  to  a  voltage  rise  of 
250v/sec.  Some  hypothetical  data  which  could  have  resulted  from 
such  an  experiment  are: 


Capacitor 

Level  of  Factor 

High  Voltage 
Breakdown  (kv) 

abed 

1 

—                    __ 

10.7 

2 

+                    —                    —                    + 

11.4 

3 

—                    +                    —                    + 

12.2 

4 

-j-                   4_                    —                    _ 

13.0 

5 

—                    —                    +                   + 

10.6 

6 

-}-                  —                  .4-                  — 

12.1 

7 

—                  4-                 +                 — 

12.0 

8 

+                 +                 +                 + 

13.2 

Analyze  and  interpret  the  one-half  replicate  of  a  2*  factorial  described 
above. 

13.19  An  experiment  was  to  be  performed  to  assess  the  effects  of  the 
following  factors  on  the  surge  voltage  of  a  specific  model  of  thermal 
battery:  temperature,  humidity,  amount  of  electrolyte,  amount  of 
heat  paper,  and  type  of  electrolyte.  These  five  factors  were  denoted 
as  a,  by  c,  d,  and  e,  respectively.  Since  this  was  only  a  preliminary 
experiment  (in  the  development  phase)  and  since  all  three-,  four-, 
and  five-factor  interactions  could  be  assumed  to  be  negligible,  a 
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one-half  replicate  of  the  25  factorial  was  performed.  Analyze  and 
interpret  the  following  data. 


Treatment  Combination 

Surge  Voltage  (volts) 

(i) 

14.0 

ae 

14.6 

be 

11.7 

ab 

16.3 

ce 

11.2 

ac 

16.6 

be 

15.6 

abce 

10.2 

de 

13.9 

ad 

13.8 

bd 

15.1 

abde 

13.2 

cd 

14.6 

acde 

14.3 

bcde 

12.6 

abed 

15.4 
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CH  APTE  R    14 

ANALYSIS  OF  COVARIANCE 

IN  PRECEDING  CHAPTERS  great  emphasis  has  been  placed  on  two  very 
important  techniques,  namely,  regression  analysis  and  analysis  of  vari 
ance.  Further,  in  certain  Sections  (11.11,  12.11,  and  13.6)  these  two 
techniques  were  combined  to  handle  particular  problems  associated 
with  the  exploration  of  response  curves  or  surfaces.  In  the  present 
chapter  we  shall  investigate  another  blending  of  these  two  fundamental 
tools.  This  new  technique,  known  as  analysis  of  covariance,  is  one 
which  has  proved  very  useful  in  many  areas  of  research. 

14.1     USES  OF  COVARIANCE  ANALYSIS 

Before  discussing  the  actual  methods  of  covariance  analysis,  let  us 
give  one  or  two  examples  of  situations  in  which  the  technique  may  be 
profitably  employed.  These  examples  should,  of  course,  indicate  to  the 
reader  the  nature  of  the  combination  of  the  ideas  of  regression  and 
analysis  of  variance.  For  the  first  example,  consider  a  case  where  the 
researcher  is  interested  in  the  effects  of  various  rations  on  the  weights 
of  hogs.  If  a  randomized  complete  block  design  is  utilized  and  the  final 
weights,  F,  of  the  animals  after  a  specified  number  of  days  of  feeding 
are  analyzed,  the  differences  among  the  effects  of  the  various  rations 
may  or  may  not  be  significant.  In  either  case,  however,  the  good  re 
searcher  will  think  more  about  the  conduct  of  the  experiment  before 
drawing  any  conclusions  from  the  analysis  of  variance  implied  in  the 
preceding  sentence.  He  might  say  to  himself,  "If  the  experimental  ani 
mals  varied  greatly  with  respect  to  their  initial  weights  at  the  time  the 
experiment  was  started,  how  do  we  know  that  differences  among  final 
weights  reflect  ration  effects  rather  than  just  varying  initial  weights? 
Calling  the  initial  weights  X,  he  might  adjust  the  F-values  according 
to  the  associated  X-values  and  then  analyze  and  interpret  the  experi 
mental  data.  The  method  by  which  this  is  carried  out  is  known  as  Co- 
variance  analysis. 

Another  example  is  the  following:  When  dealing  with  an  experiment 
to  compare  several  methods  of  teaching  statistics  in  which  the  criterion 
is  to  be  the  final  score,  Y,  obtained  by  the  students,  all  of  whom  take 
the  same  examination,  final  judgment  concerning  the  various  methods 
of  teaching  should  not  be  rendered  until  the  I.Q.  ratings,  X,  of  the  in 
dividual  students  have  been  examined  and  the  necessary  corrections 
(adjustments)  made.  Many  other  examples  could  be  given,  and  the 
reader  is  asked  to  formulate  some  in  his  own  field  of  interest  as  an  aid 
in  better  appreciating  the  techniques  to  be  presented. 
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14.2     ASSUMPTIONS  UNDERLYING  ANALYSES  OF 
COVARIANCE 

As  would  be  expected,  the  assumptions  one  makes  "when  performing 
a  covariance  analysis  are  similar  to  those  required  for  linear  regression 
and  analysis  of  variance.  Thus  we  find  the  usual  assumptions  of  inde 
pendence,  normality,  homogeneous  variances,  fixed  X's,  etc.  To  be 
more  specific,  we  give  the  mathematical  models  associated  'with  some 
of  the  more  common  designs  when  a  covariance  analysis  is  contem 
plated. 

Completely  randomized  design 


YH  -  f  Hr  Tt  +  pXv  +  €<,-;  i  =   1,  -  -  •  ,  k        (14.1) 

j  =   1,  -  -  -  ,  n 
Randomized  complete  block  design 

+  *«;  i  =  1,  •  •  •  ,r       (14.2) 


Latin  square  design 

Yijk  =  £  +  Pi  +  yj  +  rk  +  0Xvm  +  e»/(*o;      i  =  1,  -  -  -  ,  m      (14.3) 

j  =  1,  -  -  -  ,  m 
k  =  1,  •  •  •  ,  m 

Two-factor  factorial  in  a  randomized  complete  block  design 

Y-iik  =  £  +  Pi  +  <*j  +  Vk  +  (&v}jk  +P  Xijk  +  eijk',     i  =  1?  -  •  •  ,  r    (14.  4) 

j  =  1,  -  •   •  ,  a 
k  =  1,  -  •  •  ,  c. 

In  practice  it  is  more  customary  to  express  these  equations  in  terms 
of  deviations  of  the  X  variable  from  its  mean.  When  this  is  done,  the 
equations  appear  as 

YV  =  M  +  T*  +  0(Xt,-  -  X)  +  €,y;  i  =   1,   -   -   -  ,  k         (14.5) 


j  =  i,  .  .  .  ,  n 
-  M  +  Pi  +  TJ  +  /3(Jr<y  -  X}  +  €,y;  i  =   1,  -   -  -  ,  r         (14.6) 


F,-yJb    =    M    +    Pi   +    Vj    +    Tk 

+  /3(-y<yc*)  -  3)  +  €</»);          i  =  1,  •   •  •  ,  m        (14,7) 

y  =  i,  -  -  -  ,  m 

k  =   1,  •   •  •  ,  ra 
and 
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y^jk  =  M  +  pi  +  dj  H-  yk  +  (coOy* 

+  /3(Xv*  -  X)  +  €</*;  i  =   1,  -  -   -  ,  r       (14.8) 

y  =  i,  •  •  •  ,  a 

z>  =   1     ...      r 

**  •*•»  ?    °? 

respectively,  where  /j,=*g-\-p'X,  and  3s  is  the  arithmetic  mean  of  all  the 
X's.  The  reason  for  writing  the  equations  in  this  last  form  is  that,  by 
so  doing,  we  simplify  the  algebra  of  the  mathematical  solution.  Conse 
quently.,  it  is  in  this  form  that  you  will  find  the  model  presented  in 
most  texts.  The  usual  assumptions  are  made  concerning  the  various 
terms  in  each  of  the  preceding  equations,  and  the  reader  is  advised  to 
read  again  the  appropriate  sections  in  Chapters  8,  11,  12,  and  13  to 
refresh  his  memory  on  such  matters. 

Although  not  mentioned  above,  there  is  another  item  which  is  com 
monly  considered  as  a  necessary  assumption  for  a  valid  analysis  of 
covariance,  namely,  that  the  concomitant  variable,  X,  should  not 
be  affected  by  the  treatments.  That  is,  the  treatments  which  have  been 
applied  to  the  experimental  units  so  that  we  may  observe  and  judge 
their  effects  on  the  Y  variable  should  not  influence  the  observed  values 
of  -XT.  However,  this  is  too  restrictive  an  assumption.  Even  though  the 
treatments  do  affect  the  JXT-values,  a  covariance  analysis  may  be 
profitably  employed  if  proper  care  is  exercised  in  the  interpretation  of 
the  experimental  results.  It  is  clear,  then,  that  the  inferences  which 
may  be  made  are  different  in  the  two  cases,  depending  upon  whether 
or  not  the  X  variable  is  affected  by  the  treatments.  The  researcher  is 
therefore  cautioned  to  be  extremely  careful  when  dealing  with  the  inter 
pretation  of  covariance  analyses.  Let  us  now  consider  some  special 
cases  so  that  the  reader  will  not  only  gain  practice  in  the  interpretation 
of  data  amenable  to  an  analysis  of  covariance  but  also  become  familiar 
with  the  details  of  the  computational  procedure. 

14.3   COMPLETELY   RANDOMIZED    DESICN 

As  our  first  example  of  a  covariance  analysis  we  shall  make  use  of  a 
completely  randomized  design.  Before  giving  a  numerical  example,  let 
us  examine  the  problem  in  general  form.  Assuming  that  we  have  t 
treatments,  or  groups,  and  that  there  are  n^  observations  on  each  of  X 
and  Y  in  the  ith  group,  it  is  customary  to  proceed  as  follows:  Calculate 
the  following  sums  of  squares  and  products  and  then  complete  Table 
14.1  as  indicated.  (NOTE:  There  is  a  great  similarity  between  Table 
14.1  and  Table  8.22.) 


x2  =  corrected  total  sum  of  squares  for  X 

/        *          "^ 

\  ;=i  y-i 


y-y-^       ^=A^     '  (14*9) 

=  2^  2^  xa  — 
1=1  y==i 
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Variation 

Among  treatments 

Among  experimen 
tal  units  treated 
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treatments)  

Among  treatments 
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Within  treatments 
(  =  total).  

Difference  for  testir 

*  The  symbols  Sx 
other  tables  which 
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xy  =  corrected  total  sum  of  products  for  X  and  Y 


t        ni  \     /       t        m 

2;  2:**)  (2:  2: 

—  i  ,-i          /  \  .-i  y-i 


x  r  —  i  ,-i          /  \  .-i  y-i          /  (14.  10) 

=  2^ 


2^  ni 
y2  =  corrected  total  sum  of  squares  for  F 


F~.  _     (14-  u> 


t=-l 
.  =  treatment  sum  of  squares  for 


(ni  \2  /       *         ni 

T^  jc    \  I  T^  V  x 

Z-*t   ^  13  J  \      Z-*i    2-*t    ^  *} 

-2.     ^      '    -      -1  '71 


i—  i 
/  =  treatment  sum  of  products  for  X  and  Y 


yy 

2  t         ni  2 


(14.13) 


Tyy  =  treatment  sum  of  squares  for  Y 

\  2 

(14.14, 


=  experimental  error  sum  of  squares  for  X 
2  —   T 

—    -L  xx 


=  experimental  error  sum  of  products  for  X  and  F 


Eyy  =  experimental  error  sum  of  squares  for  F 
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The  proper  F-ratio  for  testing  the  hypothesis  that  there  are  no  dif 
ferences  among  the  true  effects  of  the  t  treatments  on  the  Y  variable 
after  adjusting  for  the  effect  of  the  X  variable  is 


F  « 


A* 
z;,.,-  *- 
i=l 

with  degrees  of  freedom  v\  =  t  —  1  and 


It  is  customary,  in  addition  to  performing  the  jP-test  just  indicated,  to 
present  a  table  of  adjusted  treatment  means  as  an  aid  in  the  interpre 
tation  of  the  experimental  results.  The  adjusted  means  may  be  found 
using  the  formula 


adj.   Yt  =    Y;  -  b(Xt  -  X);  i  =  1,  .  -  -  ,  t,  (14,  19) 

where  6  is  the  regression  coefficient  calculated  from  the  experimental 
error  sums  of  squares  and  products,  that  is,  'b  =  Eaxu/EaiX.  The  estimated 
variance  of  an  adjusted  treatment  mean  is  given  by 


V  (adj.  F,)  =  s^E  \-  +  (^~T)21  (14  .  20) 

Lni  Exx        J 

and  the  standard  error  of  an  adjusted  treatment  mean  by 

f\ 

«  s    y 


(adj.   F,)  «  s    y  --  1  --  ~  --        (14.21) 
n*  Exas 

The  estimated  variance  of  the  difference  between  two  adjusted  treat 
ment  means  is,  of  course,  given  by 

+  (Xi  ~  ^°T     (14.  22) 


V  (adj.  F,  -  adj.  Fy)  -  4  \-  +  — 

L^i         ny 


It  should  be  clear  that  the  regression  coefficient,  /?,  in  Equation  (14.5) 
has  been  assumed  to  be  nonzero.  If  such  were  not  the  case,  the  intro 
duction  of  the  concomitant  variable,  X,  into  the  calculations  would  be 
an  unnecessary  complication.  Sometimes  the  researcher  will  wish  to 
check  on  this  assumption.  That  is,  he  will  consider  the  hypothesis 
H:&  =  Q,  rather  than  the  assumption  £=^0.  When  this  is  done,  he  will 
be  interested  in  testing  the  validity  of  H.  The  proper  F-ratio  is 


SE 


(14.23) 
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which  has  degrees  of  freedom  v-L  =  l.  and 


TABLE   14.2-Gains  In  Weight  (F)  and  Initial  Weights  (X}  of 
Pigs  in  a  Feeding  Trial 


Treatment 

1 

2 

3 

4 

X 

F 

X 

F 

X 

F 

X 

F 

30 

165 

24 

180 

34 

156 

41 

201 

27 

170 

31 

169 

32 

189 

32 

173 

20 

130 

20 

171 

35 

138 

30 

200 

21 

156 

26 

161 

35 

190 

35 

193 

33 

167 

20 

180 

30 

160 

28 

142 

29 

151 

25 

170 

29 

172 

36 

189 

Total 

160 

939 

146 

1031 

195 

1005 

202 

1098 

TABLE    14.3-Analysis  of  Covariance  for  Data  in  Table  14.2 


Source  of 
Variation 

Degrees 
of 
Freedom 

Sum  of  Squares  and  Products 

Deviations  About  Regression 

2> 

lL,xy 

Z^2 

^v  £>»• 

Degrees 

of 
Freedom 

Mean 
Square 

^     2> 

Among  treat 
ments  

3 
20 

365.46 
361.50 

451.21 
496.83 

2163.13 
5937.83 

Among    ani 
mals  treat 
ed  alike.  .  . 

Total.  . 

5255.01 

19 

276.58 

23 

726.96 

948.04 

8100.96 

6864  .  61 
1609.60 

22 
3 

536.53 

Difference  for  testing  among  adjusted  treatment 
means 

Example  14.1 

Given  the  data  of  Table  14.2,  the  following  calculations  were  made 
and  the  results  reported  in  Table  14.3. 

]L>2  =  Sxx  =  TX*  +  Exx  =  (30)2  +  .  -  -  +  (36)2  -  ^—^  =  726.96 


•-£*!,=  (30)  (165) 


24 

4-  (36)  (189)  - 


(703)  (4073) 
24 


948.04 
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Tyy  +  Evy  =  (165)* 


h  (189)2  - 


(4073) 2 


24 


8100.96 


(160)2  +  (146)2  +  (195)2  +  (202)2       (703)2 


365.46 


•*•  xy    === 


T        — 
-*  vy  — 


6  24 

(160)  (939)  +  (146)  (1031)  +  (195)  (1005)  +  (202)  (1098) 


__  (703)  (4073) 
"""  24 

(939)2+  (103 1)2 


=  451.21 

(1005) 2 


(1098) 2       (4073) 2 


24 


2163.13 


=  S**  -  Txx  =  361.50 
=  S*v  —  Txv  =  496.83 

^  *^yv  —    •*•  yy  ==   59o/.0v5. 

Carrying  out  the  F-test  outlined  in  Equation  (14.18),  we  obtain 
F  =  536. 53/276. 58  =  1.94  with  degrees  of  freedom  ?i  =  3  and  j>2  =  19.  This 
is  not  significant  at  the  5  per  cent  level,  and  thus  we  are  unable  to  reject 
the  hypothesis  of  no  differences  among  the  true  effects  of  the  4  treat 
ments  on  the  gain  in  weight  of  pigs  after  adjusting  for  the  varying 
initial  weights  of  the  experimental  animals.  Incidentally,  in  this  case 
the  same  decision  would  have  been  reached  had  no  adjustment  been 
made  for  the  concomitant  variable.  However,  in  many  instances  the 
conclusions  may  change  considerably  depending  on  whether  or  not  the 
covariance  technique  is  used,  and  thus  the  researcher  should  always 
see  if  it  is  applicable  to  the  problem  at  hand.  The  adjusted  treatment 
means  are  presented  in  Table  14.4. 

TABLE   14.4-Calculation  of  Adjusted  Treatment  Means  From 

Data  of  Table  14.2 

(jr  =  29.29,  7=169.71,  6  =  496.83/361.50=1.374) 


Treatment 


1 

2 

3 

4 

Xi  

26.67 

24.33 

32.50 

33.67 

Xi—X  

—      2.62 

—      4.96 

3.21 

4.38 

b(Xi—  T)  

—      3.60 

—      6.82 

4.41 

6.02 

Yi  

156.50 

171.83 

167.50 

183  00 

adj.  ~Yi  

160.10 

178.65 

163  09 

176  98 

Standard  error  of  adj. 
Tf  

7.17 

8.06 

7.35 

7.80 

14.4      RANDOMIZED  COMPLETE   BLOCK   DESIGN 

When  our  data  conform  to  a  randomized  complete  block  design,  the 
appropriate  mathematical  model  is  as  given  in  Equation  (14.6).  The 
analysis  to  be  performed  is  given  in  Table  14.5,  the  quantities  Rxx, 
T**,  Ex*,  RVV)  Tyy,  and  Eyy  being  obtained  as  in  any  randomized  com- 
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plete  block  design.  The  sums  of  products  Rxyj  Txy,  and  Exy  are  found 
using  the  following  equations: 

=  corrected  total  sum  of  products  for  X  and  Y 

(14.  24) 


F<,  ±  F* 


=  replicate  (block)  sum  of  products  for  X  and  Y 

±  F,A 

^-^/    •*•  *J   I 

(14.25) 
t  rt 

,=  treatment  sum  of  products  for  X  and  Y 

3 1  \   -  ,    -   „      **  / 

(14.26) 

(14.27) 

The  proper  F-ratio  for  testing  the  hypothesis  of  no  differences  among 
the  true  effects  of  the  t  treatments  on  the  F  variable  after  adjusting  for 
the  effect  of  the  X  variable  is 


r  rt 

=  experimental  error  sum  of  products  for  X  and 


(ST+E  - 


(  } 


As  in  the  case  of  a  completely  randomized  design,  we  will  wish  to 
calculate  the  adjusted  treatment  means  and,  possibly,  to  test  the  hy 
pothesis  that  0  of  Equation  (14.6)  is  0.  The  calculation  of  the  adjusted 
treatment  means  is  easily  carried  out  using 

adj.   Y,,  =    F.y  -  b(X.,  -2);          j  =  1,  -  .  -  ,  t,         (14.29) 

where  &  is  the  regression  coefficient  computed  from  the  experimental 
error  sums  of  squares  and  products,  that  is,  b  =  E^/E^.  The  estimated 
variance  of  an  adjusted  treatment  mean  is 


V  (adj.  F.,-)  =  4  f-1  +  V-*-  ^1  (14.  30) 

L  r  Exx         J 

and  the  standard  error  of  an  adjusted  treatment  mean  is 


Jadi.  ?.,  =  VF(adj 


j.  F.y)  =  SE\/~  +  (X\     ^  •        (14.31) 
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The  estimated  variance  of  the  difference  between  two  adjusted  treat 
ment  means  is  _ 


V  (adj.  F.y  -  adj.  TV)  =  4    -  + 


To  test  the  hypothesis  that  13  equals  0,  we  calculate 


F  = 


(14.32) 


(14.33) 


which  has  degrees  of  freedom  v±=  1  and  v2  =  (r—  1) (t—  1)  — - 1. 


TABLE   14.6- Yields  for  3  Varieties  of  a  Certain   Crop  in  a  Randomized 
Complete  Block  Design  With  4  Blocks 

(^  ==  yield  of  a  plot  in  a  preliminary  year  under  uniformity 
trial  conditions;  Y  — yield  on  the  same  plot  in  the 
experimental  year  when  the  3  varieties  were  used) 


Varieties 

"Rlorlr 

Block 

A 

B 

C 

Totals 

1  

X 

54 

51 

57 

162 

Y 

64 

65 

72 

201 

2  

X 

62 

64 

60 

186 

Y 

68 

69 

70 

207 

3 

X 

51 

47 

46 

144 

Y 

54 

60 

57 

171 

4 

X 

53 

50 

41 

144 

Y 

62 

66 

61 

189 

Variety.  .  .  . 

X 

220 

212 

204 

636 

Totals  .  . 

Y 

248 

260 

260 

768 

Reproduced  from  Table  7  in  John  Wishart,  Field  Trials  II:  The  Analysis  of  Covariance, 
Tech.  Comm.  No.  15,  Commonwealth  Bureau  of  Plant  Breeding  and  Genetics,  School  of 
Agriculture,  Cambridge,  England,  May,  1950.  With  permission  of  author  and  publishers. 

Example  14.2 

Consider  the  data  of  Table  14.6.  These  data  have  been  examined  in 
considerable  detail  by  Wishart  (17);  we  shall,  however,  consider  them 
from  a  more  limited  point  of  view,  which  will  be  sufficient  for  our  pur 
poses.  The  required  calculations  are 


396 


12 
(162)2  4-  (186)2  +  (144)2  4-  (144) 2       (636) 2 


12 
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(220)2  +  (212)2  +  (204)2       (636)2 


12 


32 


514  —  396  —  32  =  86 
+  ----  h  (61)2  __ 


324 


(201)2  +  (207)2  +  (171)2  +  (189)2       (768)  * 
-  — 

(248)2  +  (260)  2  +  (260)  2       (768)  2 


252 


__ 

-  24 


-  324  -  252  —  24  =  48 

-  (54)  (64)  -i  ----  +  (41)  (61)  - 


286 


(162)  (201)  +  (186)  (207)  +  (144)  (171)  +  (144)  (189)        (636)  (768) 


(220)  (248)  +  (2 12)  (260)  +  (204)  (260)        (636)  (768) 

4  ~~  12 

=  286  —  264  —  (—24)  =  46, 
and  these  are  summarized  in  Table  14.7. 


12 

—  24 


264 


TABLE   14.7-Analysis  of  Covariance  for  Data  of  Table  14.6 


Source  of 
Variation 

Degrees 
of 
Free 
dom 

Sum  of  Squares 
and  Products 

Deviations  About 
Regression 

2>2 

][>:y 

Zy 

IZy*         ^ 
-(2>30VZ> 

Degrees 
of  Free 
dom 

Mean 
Square 

Replicates  (blocks)  . 
Treatments      (varie 
ties) 

3 

2 
6 

396 

32 
86 

264 

-24 
46 

252 

24 
48 

Experimental  error. 

23.4 

5 

4.68 

Treatments  +  error  . 

8 

118 

22 

72 

67.9 

44.5 

7 
2 

22.25 

Difference  for  testing  among  adjusted  variety  means  .  . 

Before  testing  the  hypothesis  of  no  differences  among  the  true  effects 
of  adjusted  varieties,  let  us  test  the  hypothesis  that  the  true  regression 
coefficient,  0,  is  0.  After  all,  it  is  more  reasonable  to  examine  this  point 
first,  for  unless  we  can  reject  such  a  hypothesis,  that  is,  unless  we  can 
safely  conclude  that  /3  5^0,  the  decision  to  perform  a  regular  analysis  of 
covariance  is  questionable.  Accordingly,  we  compute  F  —  [(46)2/48]/4.68 
=  9.42  with  degrees  of  freedom  *>i  =  l  and  v*=* 5.  Since  F  =  9. 42  >Ff 95(1,5) 
=  6.61,  we  may  reasonably  assume  that  /J  is  not  0  and  thus  be  justified 
in  performing  a  covariance  analysis. 

We  shall  now  examine  the  variety  differences.  First,  we  note  that  an 
ordinary  analysis  of  variance  would  give  rise  to  F  =  (24/2)/(48/6) 
=  12/8  =  1.5  with  degrees  of  freedom  j>i  =  2  and  j>2  =  6,  and  this  would 
not  permit  us  to  reject  the  hypothesis  of  no  differences  among  the  true 
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effects  of  the  3  varieties.  Let  us  next  observe  what  effect,  if  any,  the  per 
formance  of  a  co variance  analysis  will  have  on  our  inferences.  To  test 
the  hypothesis  of  no  differences  among  the  true  effects  of  the  3  varieties 
after  adjusting  for  the  effect  of  the  natural  fertility  differences  from  plot 
to  plot,  as  measured  by  the  uniformity  trial,  we  compute  F  =  22. 25/4. 68 
=  4.75  with  degrees  of  freedom  v\  =  2  and  v2  =  5.  Since  F=4.75  lies 
between  F.90(2,5)  =  3.78  and  F. 95(2,5)  =5.79,  the  variety  differences  would 
not  ordinarily  be  called  statistically  significant.  However,  the  reader  is 
cautioned  again  that  the  choice  of  the  significance  level  is  quite  arbi 
trary  and  the  above  results,  therefore,  may  well  indicate  differences  that 
are  of  real  importance.  Apart  from  this,  we  note  that  the  adjustment  of 
the  yields  for  the  unequal  fertilities  of  the  experimental  plots  has  caused 
the  resulting  F-value  to  approach  more  closely  what  is  conventionally 
thought  of  as  a  critical  value.  This  should  suggest  to  the  researcher  that 
quite  likely  the  fertility  differences  among  the  plots  are  tending  to 
obscure  the  true  differences  among  varieties.  If  the  experiment  were 
performed  again  with  more  replication,  significant  results  might  be 
obtained. 

14.5      LATIN   SQUARE   DESIGN 

The  performance  of  an  analysis  of  covariance  on  data  resulting 
from  a  Latin  square  design  introduces  no  new  concepts.  Thus,  we  shall 
proceed  immediately  to  outline  the  computational  technique  and 
indicate  the  appropriate  test  procedures.  The  only  calculations  besides 
those  specified  for  an  ordinary  analysis  of  variance  in  a  Latin  square 
are  the  sums  of  products.  These  are  found  as  follows: 

=  corrected  total  sum  of  products  for  X  and  Y 

_"*     *** 

2: 2: 


m        m 


=  row  sum  of  products  for  X  and  F 


m  (14.35) 


column  sum  of  products  for  X  and  F 

m       /     m  \      /     rn 

z:(  i:x-«c*))(  z: 

/— 1     \    z-»l  /     \  i-1 


(14.36) 
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m        m 


TW  =  treatment  sum  of  products  for  X  and  Y 

m  /     jm        m  x      /      m        rn  \ 

Z  («r*)  G,rfc)       (  Z  Z  *<,<*,  )  (  Z  Z  Ftfob,  )        (14  .  37) 

k=t  \    t—l    jr'=l  /     \    i=l    j=l  / 

m  m* 

E*y  =   5>y  ~  ^-v  ~"  C^  —  r^,  (14.38) 

where 

xT^  =  sum  of  all  X  observations  associated  with  treatment  k       (14.39) 
yTk  ==  sum  of  all  Y  observations  associated  with  treatment  k.     (14.40) 

The  results  of  these  calculations  are  presented  in  Table  14.8.  The  test 
of  H  :  /3  =  0  is  given  by 

F  =  _  E**"/Exx  _  =    E^/Exx  (14  41) 

SE/[(m  -  l)(m  -  2)  -  1]  ^  U    ^ 


with  degrees   of   freedom   j>i=l   and   v^—  (m—~  l)(m  —  2)  —  1.    To    test 
among  adjusted  treatment  means  we  compute 


=  =         + 

^/[(«  -!)(«-  2)  -  1]  4  ' 

with  degrees  of  freedom  ^i  =  ?7^  —  1  and  ^2  =  (m  —  1)  (m  —  2)  —  1  . 
The  adjusted  treatment  means  may  be  found  using 


adj.  F..0b)  -  7..(Jfc)  -  SC^.c*)  -  ^)  (14.43) 

where  6  is  the  regression  coefficient  associated  with  experimental  error, 
that  is,  b=Exv/Exx.  The  estimated  variance  of  an  adjusted  treatment 
mean  is 


V  (adj.  7..w)  -  4  |^—  +  v     "w           '     \ ,  (14.44) 

and  the  estimated  variance  of  the  difference  between  two  adjusted 
treatment  means  is 

f'Cadj.  "F..C*)  —  adj.  7..(fc/))  =  4    —    ^ — "(&/)      L  (14.45) 
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14.6     TWO-FACTOR   FACTORIAL   IN   A   RANDOMIZED 
COMPLETE   BLOCK  DESIGN 

The  performance  of  a  covariance  analysis  when  the  treatments  are 
of  a  factorial  nature  follows  the  pattern  established  in  the  preceding 
sections.  The  only  refinement  that  occurs  is  one  which  is  naturally 
anticipated  once  we  know  we  are  dealing  with  a  factorial  setup:  We 
are  now  able  to  test  among  adjusted  means  for  each  of  the  factors  and 
for  all  the  interactions.  The  appropriate  calculations,  in  addition  to 
those  outlined  earlier  are  given  below: 

=  corrected  total  sum  of  products  for  X  and  Y 

r         a          b 

=  23  23  23  XjjkYijk 

a=l    y=l    fc=l 

(14.46) 

r         a         b  \      /      r         a         b 


(r         a         b  \      /      r         a         b 

z  z  z  *«*)  (  z  z  z  Yij 
i=-l    y=l    £=1  /     \    i=l    y—1    A;«*l 


rab 
=  replicate  sum  of  products  for  X  and  Y 

b  /      a          b 


T        /      a          b  \     /      a          b 

z(  sz;*«0(5:i:r«* 

i=*l    \  y=l    A=l  /     \  /«!    A:=l 


ab 

(14.47) 

(r          a          &  N.       /      r          a          b 

Z  Z  Z  x«*  )  (  Z  Z  Z  F«» 
4^.1  y=i  <fe—  =1  /   \  a-«i  y«i  A—I 


rab 
=  sum  of  products  for  X  and  Y  for  the  a  X  b  table 


(r         a,         b  \      /      r         a    '     b 

Z  Z  Z  x«*)  (  Z  Z  Z 
x=l   y=»i    &*=•!  /    \   i,-,!   y»*i    fc=l 


a,       /      r          b  \    /      r          b 

Z  (  Z  Z  x«*)  (  Z  Z  F 

y—1    \   i=l  A— 1  /    \  t— 1  A=-X 


(r          a          b  \.       /      r          a          b 

Z  Z  Z  xv)  (  Z  Z  Z  r< 
^=1  y—i  A«=I  /    \  i=i  y=-i  jfc=-i 


(14.48) 


(14.49) 

a,        /      r          b  \     /      r          b 


r5  (14.50) 


raft 
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J>        /       r          a  v       /      r          a, 

z;  ( z  z;  *«»)  ( z:  i: 

fc=l   \  i«l  J—l  /  \  i=l  JW1 


ra 


(14.51) 

(r          g          5  \       /      r          a,          b 

z:  z:  z;  *«»  )  (  z;  z:  z  r 
1=1  y«=i  jb^i  /    \  twi  y=i  &=i 

rob 
Bxy.  (14.52) 

These  calculations  are  summarized  in  Table  14.9.  Following  are  the 
appropriate  ^-ratios  for  testing  the  hypotheses:  (1)  no  differences 
among  the  true  effects  of  the  levels  of  factor  a  on  the  variable  Y  after 
adjusting  for  the  concomitant  variable  X,  (2)  no  differences  among  the 
true  effects  of  the  levels  of  factor  &  on  the  variable  Y  after  adjusting 
for  the  concomitant  variable  X,  and  (3)  no  interaction  between  factor 
a  and  factor  6  as  they  affect  the  variable  Y  after  adjusting  for  the  effect 
of  the  concomitant  variable  X,  respectively, 

-  —  1) 

'  /•.,    ,      CTQ\ 

'    (14'53) 


. 


and 

=  (^H^-  =      M, 


The  test  of  H:@  =  Q  is  performed  by  calculating 

F  =  — 2^ — —  (14.56) 

with  degrees  of  freedom  n=l  and  j>2=  (r—  l)(ab  — -1)  —  1. 

The  appropriate  standard  errors  for  the  various  mean  effects  are 
found  from  the  following  estimated  variances : 


A-effect  V  (adj.  T.y.)  -  ^~  +        "  <14' 57) 

B-effect  V  (adj.  T..*)  =  ^  f—  +  (^"* _"  ^H  (14.  58) 

Lra^  J^x*         J 

f  (adj.  7^)  =  *Z  f-  +  (X^~X)1 '  <14' 59) 
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Example  14.3 

As  an  example  of  a  covariance  analysis  in  a  randomized  complete 
block  design  where  the  treatments  are  of  a  factorial  nature,  consider 
the  data  of  Table  14.10,  These  data  were  originally  examined  by 
Wishart  (17),  and  the  interested  reader  is  referred  to  his  study  for  a  more 
detailed  discussion  than  we  shall  give  here. 

In  discussing  the  data  of  Table  14.10,  we  shall  consider  the  5  pens  as 
5  replicates,  and  thus  we  have  a  3X2  factorial  in  a  randomized  complete 
block  design.  Following  the  calculational  procedure  outlined  in  Table 
14.9,  we  arrive  at  the  results  presented  in  Table  14.11. 

To  test  H:{3  =  0,  we  calculate 


F  = 


(39.367)  V442.93 
0.2534 


13.81 


with   degrees   of  freedom  PI  =  1    and  v2=19,    and  this  is  significant  at 
a.  =  .01.  The  various  treatment  effects  may  also  be   tested  for  signifi- 


TABLE   14.10-Initial  Weights  and  Gains  in  Weight  of  Young  Pigs 
in  a  Comparative  Feeding  Trial 

(X  =  initial  weight  in  pounds;  Y  —  gain  in  weight  in  pounds} 


Pen 

Feeding  Treatments 

Totals 

A 

B 

C 

Male 

Female 

Male 

Female 

Male 

Female 

I 

X 
Y 

38 
9.52 

48 
9.94 

39 
8.51 

48 
10.00 

48 
9.11 

48 
9.75 

269 
56.83 

II 

X 
Y 

35 
8.21 

32 
9.48 

38 
9.95 

32 
9.24 

37 
8.50 

28 
8.66 

202 
54.04 

III 

X 
Y 

41 
9.32 

35 
9.32 

46 
8.43 

41 
9.34 

42 
8.90 

33 
7.63 

238 
52.94 

IV 

X 
Y 

48 
10.56 

46 
10.90 

40 
8.86 

46 
9.68 

42 
9.51 

50 
10.37 

272 
59.88 

V 

X 
Y 

43 
10.42 

32 

8.82 

40 
9.20 

37 
9.67 

40 
8.76 

30 
8.57 

222 
55.44 

Totals 

X 
Y 

205 
48.03 

193 
48.46 

203 
44.95 

204 
47.93 

209 
44.78 

189 
44.98 

1203 
279.13 

Reproduced  from  Table  11  in  John  Wishart,  Field  Trials  II:  The  Analysis  of  Covariance, 
Tech.  Comm.  No.  15,  Commonwealth.  Bureau  of  Plant  Breeding  and  Genetics,  School  of 
Agriculture,  Cambridge,  England,  May,  1950.  With  permission  of  author  and  publishers. 
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TABLE   14. 11- Analysis  of  Covariance  for  the  Data  of  Table  14.10 


Source  of 
Variation 

Degrees 
of 
Free 
dom 

Sum  of  Squares  and  Products 

Deviations  About  Regression 

2>      I>3>      Z?2 

(2>y)2 

Y%* 

Degrees 

of  Free 
dom 

Mean 
Square 

-    v 

Replicates  (pens)  .  . 
Treatments 
Food 

4 

2 
1 
2 
20 

605.87       39.905     4.8518 

5,40     -0.147     2.2686 
32.03     -3.730     0.4344 
22.47         3.112     0.4761 
442.93       39.367     8.3144 

Sex 

Food  X  sex 

Experimental  error 

4.8155 

19 

0.2534 

Food  ~  f~  error 

22 

448.33       39.220  10.5830 

7.1520 

21 

Difference  for  testing  among 

adjusted  food  means  

2.3365 

2 

1  .  16825 

Sex-j-error  

21 

474.96       35.637     8.7488 

6.0749 

20 

Difference  for  testing  among 

adjusted  sex  means 

1.2594 

1 

1.2594 

(Food  X  sex) 
-f-  (error)  .  

22 

465.40       42.479     8.7905 

4.9133 

21 

Difference  for  testing  among  adjusted  foodXsex  effects  .  .  . 

0.0978 

2 

0.0489 

cance,  the  appropriate  variance  ratios  being 

1.16825 


Food:  F  = 


Sex: 


Food  X  Sex:     F 


0.2534 
1.2594 
0.2534  : 
0.0489 
0.2534  : 


4.61 


•  4.97 


0.19 


where  the  degrees  of  freedom  are  as  given  in  Table  14.11.  These  F-ratios 
(and  the  corresponding  inferences)  should  be  compared  with  those 
resulting  from  an  analysis  of  variance  on  the  gains  in  weight  taking  no 
account  of  the  varying  initial  weights.  Such  comparisons  will  aid  the 
reader  in  understanding  the  principles  of  covariance  analyses  and,  in 
our  example,  will  help  to  explain  the  effect  of  initial  weights  on  weight 
gains  subject  to  the  chosen  experimental  conditions.  A  table  of  adjusted 
treatment  means,  together  with  the  appropriate  standard  errors, 
should  also  be  presented  to  make  the  analysis  complete. 

COVARIANCE  WHEN   THE  X  VARIABLE   IS 
AFFECTED  BY  THE  TREATMENTS 

When  the  treatments  being  employed  in  the  experiment  are  such 
that  they  have  an  appreciable  effect  on  the  concomitant  variate,  X, 
as  well  as  on  the  Y  variate,  the  researcher  should  proceed  with  caution. 
Computationally,  each  step  is  carried  through  as  before,  but  the  final 


14.7 
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inferences  must  take  account  of  the  effect  which  the  treatments  have 
had  on  the  concomitant  variate.  For  example,  if  the  concomitant 
variate  in  a  feeding  experiment  had  been  "amount  of  feed  consumed" 
rather  than  the  "initial  weight/'  it  is  quite  possible  that  the  different 
treatments  (feeds)  "would  have  a  significant  effect  on  the  food  con 
sumption.  Thus  any  covariance  analysis  of  gain  in  weight  should  take 
cognizance  of  the  weight-producing  effects  of  the  different  feeds  due  to 
increased  (or  decreased)  consumption  apart  from  any  nutritional  dif 
ferences  among  the  feeds.  Other  examples  of  covariance  analyses  in 
volving  similar  problems  are  available  in  the  literature  and  should  be 
studied  critically  by  those  who  desire  a  fuller  understanding  of  covari 
ance  techniques.  The  reader  is  especially  referred  to  Cochran  and 
Cox  (5)  and  Bartlett  (2)  for  a  discussion  of  this  type  of  problem. 

14.8      MULTIPLE  COVARIANCE 

As  might  be  anticipated,  procedures  are  available  for  performing 
covariance  analyses  when  data  are  collected  on  two  or  more  concomit 
ant  variables.  These  procedures  will  follow  the  same  pattern  that  has 
been  used  in  the  preceding  sections  of  this  chapter,  the  only  change 
being  that  the  sum  of  squares  due  to  regression  is  calculated  in  accord 
ance  witli  the  principles  outlined  in  Section  8.15.  Rather  than  specify 
an  analytical  approach  for  the  general  case,  let  us  be  content  with  a 
numerical  example  to  illustrate  the  ideas  involved. 

Example  14.4 

Crampton  and  Hopkins  (8)  studied  the  effects  of  initial  weight  and 
food  consumption  on  the  gaining  ability  of  pigs  when  given  different 
feeds.  The  data  are  presented  in  Table  14.12. 

To  carry  out  a  multiple  covariance  analysis,  the  first  step  is  to  find  the 
various  sums  of  squares  and  products  for  treatments  and  for  error  and, 
hence,  for  "treatments+error."  These  are  determined  to  be 

^2x2  =  28,404.9 
Sr,*,  =  90,792.3 
>*,*,  «  119,197.2 

^x,  —  2187.8 
£,„,  =  264<5.2 


T*i*i-  509.2 

£*i*i=  368.4 

Tyy    =            5741.7 
Eyy    =     10,405.5 

&I!BI  =  877.6 

TXlV  =  1172.2 
JSclV  =  1001.8 

Syy    -     16,147.2 

TXlky  =  11,596.5 
E*rt  =  24,508.7 

2173.8       SXjty  =  36,105.2        S^x*  —  4834.0 

where 

Y  =  final  weight 
X±  —  initial  weight 
Xz  —  feed  eaten. 

The  next  step  is  to  calculate  the  partial  regression  coefficients  as 
sociated  with  the  multiple  regression  equation  so  that  we  may  obtain 
the  sum  of  squares  "due  to  regression"  and  thus,  by  subtraction  from 
the  corrected  total  sum  of  squares,  the  sum  of  squares  of  the  deviations 
about  regression.  The  required  partial  regression  coefficients  may  be 
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found  using  the  methods  of  Chapter  8.  Since  we  are  dealing  with  a 
case  involving  only  two  independent  variables,  it  is  simpler  to  solve  the 
following  equations: 

Error 

36SAbuE  +      2646.2&2J?  =      1001.8 
2646.2&1J?  +  90,792.3^2^7  =  24,508,7 


Treatment  +  error 
87 
4834.0&UJP+JB)  4-  119,197.2&2<T+*> 


2173.8 
36,105.2 


giving 


.9868, 


=  .2412, 


«  1.0410  and 


.2607. 


Thus,  the  sums  of  squares  due  to  regression  (and,  hence,  the  sum  of 
squares  of  the  deviations  about  regression)  for  "error"  and  for  "treat- 
ment  +  error,"  respectively,  are  determined  to  be: 

Error 

S.S.  due  to  regression  =  (.9868)  (1001.8)  +  (.2412)  (24,508.7) 

=  6900.07 
S.S.  of  deviations  about  regression  =»  10,405.5  —  6900.07 

=  3505.43 
Treatment  +  error 

S.S.  due  to  regression  =  (1.0410)  (2173.8)  +  (.2  607)  (3  6,  105.  2) 

=  11,675.45 

5\.S.  of  deviations  about  regression  =  16,147.2  —  11,675.45 

=  4471.75. 

These  results  are  then  presented  in  Table  14.13.  Notice  that  this  time 
we  have  '  lost'7  two  degrees  of  freedom  rather  than  one  degree  of  freedom 
as  in  a  simple  covariance.  If  there  had  been  k  independent  (concomitant) 
variables,  we  would  have  "lost"  k  degrees  of  freedom.  The  reader  will 
note  that  this  is  in  agreement  with  the  procedures  outlined  in  Chapter  8. 
The  F-test  is  then  performed  as  before,  and  we  obtain 
F  =  241,  58/103.  10  =  2.  34  with  degrees  of  freedom  vi  =  4  and  i>2  =  34, 
which  is  not  significant  at  a.  =  .05.  This  should  be  compared  with  the 

TABLE   14.  13-  Abbreviated  Analysis  of  Covariance  for  Data  of  Table  14.12 


Source  of 
Variation 

Degrees 
of  Free 
dom 

I> 

S.S.  Due  to 
Regression 

S.S.  Devia 
tions  About 
Regression 

Degrees 
of  Free 
dom 

Mean 
Square 

Treatments  (T)  .  . 
Error  CE)  

4 
36 

5741.7 
10,405.5 

6900.07 

3505.43 

34 

103  .  10 

T-f-72 

40 

16,147.2 

11,675.45 

4471.75 
966.32 

38 
4 

241.58 

Difference  for  testing  among  adjusted  treatment  means  .  . 
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results  obtainable  from  an  analysis  of  variance  of  the  final  weights,  and 
the  appropriate  conclusions  drawn.  Adjusted  treatment  means  and  their 
standard  errors  may  be  calculated  by  combining  methods  indicated  in 
Section  8.25  and  earlier  sections  of  this  chapter. 

Problems 

14,1        An  experiment  using  a  randomized  complete  block  design  gave  the 
following  corrected  sums  of  squares  and  products: 


Degrees 

of  Free 

Source  of  Variation 

dom 

2>2 

Z>:v 

Z?2 

b 

Replicates 

5 

200 

600 

4000 

Treatments  

5 

100 

200 

2500 

2 

Experimental  error.  .  . 

25 

300 

1200 

7500 

4 

(a)   Based  on  the  experimental  error  sum  of  squares  and  products, 

is  the  regression  of  Y  on  X  significant  at  a:  =  .05? 
(6)    Are  the  differences  among  the  treatment  means  for  Y  adjusted  for 

variation  attributed  to  X  significant  at  OL  =  .05? 
(c)    What  conclusions  do  you  draw  from  the  above  data  about  the 

effects  of  treatments?  Make  any  additional  computations  that 

you  consider  necessary. 
14.2        Given  the  following  data: 


Source  of  Variation 

Degrees  of 
Freedom 

X>2 

Z>;y 

i:?2 

Replicates  

4 

100 

140 

400 

Xreatments  

10 

100 

100 

900 

Experimental  error  

40 

400 

900 

2500 

(a)   What  conclusions  may  be  drawn  about  the  effect  of  treatments 

on  F? 
(6)    Test  the  regression  coefficient  based  on  experimental  error  for 

significance  at  the  5  per  cent  level. 

14.3  Ten  lines  of  soybeans  were  compared  in  randomized  complete  blocks 
with  4  replications.  The  differences  in  yield,  Y,  were  not  significant, 
but  it  was  observed  that  the  incidence  of  an  infestation,  X,  differed 
among  the  varieties.  Following  is  the  table  of  sums  of  squares  and 
products: 


Source 

Degrees 

of 

of  Free 

yjlx* 

y^xv 

>  ^-y2 

Variation 

dom 

Lines  

9 

4684 

—  532 

112 

Error  

27 

3317 

—  65O 

216 
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Test  the  hypothesis  that  the  yields  adjusted  for  infestation  do  not 
differ  in  the  sampled  populations.  What  fraction  of  X)2/2  f°r  lines  is 
unexplained  by  the  regression? 

j.4.4  For  an  experiment  involving  9  soil  sterilization  treatments,  the  effect 
on  the  number  of  seedling  alfalfa  plants  (X)  and  on  the  green  weight 
of  plants  at  3  weeks  (F)  is  summarized  by  the  following  sums  of 
squares  and  products: 


Source  of 

Degrees 
of  Free 

Variation 

dom 

!>2 

T,*y 

Z?2 

Replicates 

5 

4 

16 

96 

Treatments  

8 

16 

32 

80 

Error  

40 

20 

40 

160 

Complete  the  analysis,  making  appropriate  tests  to  indicate  the 
reason  for  your  conclusions.  If  the  mean  of  the  X9 s  is  15  and  the 
mean  of  the  Y's  is  25,  give  the  regression  equation  for  error. 
14.5  A  study  of  eastern  Iowa  farms  included  one  group  of  tenants  who 
were  not  related  to  their  landlords,  and  another  group  of  tenants, 
each  of  whom  was  related  to  his  landlord.  It  was  assumed  that  soil 
improvements  would  be  more  generally  undertaken  when  landlord 
and  tenant  were  related.  Hence,  value  of  crops  should  be  greater  in 
those  situations.  An  analysis  of  variance  was  undertaken  to  examine 
this  hypothesis.  Since  size  of  farm  could  confuse  the  comparison,  the 
size  of  farm  was  introduced  as  a  covariate.  The  following  table  was 
prepared : 

COVARIANCE  ANALYSIS  or  VALUE  or  CROPS  ON  SIZE  OP  FARM  TOR 
EASTERN  IOWA  FARMS  WITH  LANDLORD  AND  TENANT  RELATED 
AND  LANDLORD  AND  TENANT  NOT  RELATED 


Source  of  Variation 

Degrees 
of  Free 
dom 

I>* 

JLxy 

I>* 

Total  

59 

125000 

33000 

36600 

Sub-areas  (replicates)  

4 

20000 

14010 

13600 

Bet.  grps   of  tenants  

1 

61000 

13260 

4200 

Interaction  , 

4 

4000 

—  13270 

6100 

"Within  subclasses  

50 

40000 

19000 

12700 

14.6 


Value  of  crops  has  been  coded  for  this  analysis. 

(a)   Is  the  acceptance  of  the  hypothesis,  H  (no  difference  between 

groups  of  tenants)  changed  by  the  introduction  of  farm  size  as  the 

covariate? 

(6)    Is  the  error  regression  significant? 
A  sample  of  farms  was  taken  in  the  eastern  livestock  area  of  Iowa  for 
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the  purpose  of  studying  certain  types  of  farm  lease  arrangements. 
For  this  problem  we  are  taking  a  portion  of  the  data  to  study  the  dif 
ference  in  "gross  value  of  crops"  produced  on  two  groups  of  cash- 
rented  farms:  (1)  farms  for  which  landlord  and  tenant  are  related, 
and  (2)  farms  for  which  landlord  and  tenant  are  not  related.  The 
variates  measured  are  value  of  crops  produced  (F)  and  size  of  farm 
(X).  The  data  presented  are  given  for  3  hypothetical  blocks.  In 
practice,  these  blocks  might  be  strata,  e.g.,  different  counties,  differ 
ent  soil  areas,  type-of-farming  areas,  or  groups  of  farms  enumerated 
by  different  enumerators,  for  instance,  3  agricultural  economics  stu 
dents  for  the  3  blocks  of  our  example.  The  data  are  presented  in  the 
table  following.  Perform  an  analysis  of  covariance  on  these  data. 

FARM  DATA  FROM  EASTERN  LIVESTOCK  AREA,  IOWA,  FOR  COVARIANCE 

ANALYSIS  OF  VALUE  OF  CROPS  AND  SIZE  OF  FARM  GROUPS  To  BE 

COMPARED:  FARMS  WITH  LANDLORD  AND  TENANT  RELATED 

AND  LANDLORD  AND  TENANT  NOT  RELATED* 


Block  I 

Block  II 

Block  III 

Farm 

Y 

X 

Farm 

Y 

X 

Farm 

Y 

X 

No. 

Related 

No. 

Related 

No. 

Related 

22 

6399 

160 

27 

2490 

90 

17 

4489 

120 

13 

8456 

320 

24 

5349 

154 

25 

10026 

245 

20 

8453 

200 

11 

5518 

160 

1 

5659 

160 

8 

4891 

160 

34 

10417 

234 

26 

5475 

160 

21 

3491 

120 

38 

4278 

120 

4 

11382 

320 

Not  Related 

Not 

Related 

Not 

Related 

31 

6944 

160 

13 

4936 

160 

20 

5731 

160 

30 

6971 

160 

1 

7376 

200 

15 

6787 

173 

11 

4053 

120 

19 

6216 

160 

7 

5814 

134 

6 

8767 

280 

32 

10313 

240 

5 

9607 

239 

16 

6765 

160 

28 

5124 

120 

25 

9817 

320 

*  Source:  Agricultural  Economics  Dept.,  Iowa  State  College,  1951. 

14.7  (a)    What  are  the  assumptions  behind  a  covariance  analysis? 

(6)    In  the  process  of  analyzing  data  by  a  covariance  analysis,  what 

tests  of  significance  are  made? 
(c)    Explain  the  interpretation  or  inferences  and  the  course  of  action 

indicated  when  each  of  the  above  tests  is  significant;  when  each 

is  nonsignificant. 

14.8  The  following  is  an  experiment  involving  randomized  complete  blocks 
with  4  replications.  Eleven  lines  of  soybeans  were  planted.  The  data 
are  as  follows: 

X i  =  maturity,  measured  in  days  later  than  the  Hawkeye  variety 
Xz  =  lodging,  measured  on  a  scale  from  0  to  5 

Y   =  infection  by  stem,  canker  measured  as  a  percentage  of  stalks 
infected. 


PROBLEMS 
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Replicate  1 

Replicate  2 

Replicate  3 

Replicate  4 

Line 

JSTi      X^         Y 

Xi      X*         Y 

JSTi      X2         Y 

-Yi      X2         F 

Lincoln    

9     3.0     19  3 

10     20     29.2 

12     3  0       10 

925        6.4 

A7-6102    

10     3.0     10.1 

10     20     34.7 

9     20      14  0 

930       5.6 

A7-6323    .... 

10     25      13   1 

9      15     59  3 

12     2  5        11 

10     25       8.1 

A7-6520   

8     2.0     15   6 

5     20     49  0 

8     20     17  4 

6     20     11   7 

A7-6905  

12     2.5       4.3 

11      10     48.2 

13     3  0       63 

10     2  5       67 

C-739            ...    . 

4     2.0     25   2 

2      15     36  5 

2     20     23  4 

1     20     12  9 

C-776      

3      1.5     67.6 

4     10     79  3 

6     2  0     13   6 

2      15     39  4 

H-6150  . 

7     2.0     35   1 

8     2  0     4O  0 

7     20     24  7 

720       48 

L6-8477 

8     20      14  0 

8      15     30  2 

10     1   5        72 

720       89 

L7-1287 

925        3.3 

9     20     35  8 

13     3  0       11 

930       20 

BEIV   Sp 

10     3   5        31 

10     3  0       96 

11     3  0        10 

10     3  5       01 

The  principal  objective  is  to  learn  whether  maturity  or  lodging  is 
more  closely  related  to  infection.  Determine  this  from  the  error 
multiple  regression.  Test  the  hypothesis  of  no  differences  among 
adjusted  mean  infection  for  the  varieties. 

14.9  Discuss  the  use  of  covarianee  analysis.  What  factors  must  be  con 
sidered  in  interpreting  the  results  of  the  analysis? 

14.10  The  data  for  this  problem  consist  of  54  pairs  of  observations  on  the 
calories  consumed  (Y)  on  one  particular  day  by  a  respondent,  and  her 
age  (X) .  The  respondents  were  adult  Iowa  women  over  the  age  of  30 
who  were  interviewed  to  obtain  information  on  nutrition  and  health. 
About   1000   women  were   so   enumerated  for  this  survey,    and  our 
group  of  54  is  a  subgroup  from  the  total,  which  was  taken  so  as  to 
make  numbers  in  the  subclasses  equal. 

Among  the  items  observed  for  each  respondent  in  addition  to 
caloric  intake  and  age  was  place  of  residence  (zone)  and  income  class. 
These  are  listed  as 


Zone  1 — open  country 
Zone  2 — rural  place 
Zone  3 — urban 


Income  Group 


1 
2 
3 
4 
5 
6 


0-$  999 
1000-  1499 
1500-  1999 
2000-  2999 
3000-  3999 
over  4000 


Education,  height,  weight,  national  origin,  marital  status,  family 
composition,  and  many  other  factors  were  recorded  for  each  re 
spondent. 

The  nutritionists  studying  these  data  are  interested  in  determining 
how  food  intake  and  health  are  related  to  these  other  observed  fac 
tors.  A  few  relevant  hypotheses  could  be  advanced.  Preliminary 
analysis  consisted  of  preparing  tables  of  means  for  several  classifica 
tions  of  the  total  sample  and  graphical  analysis  (plotting  on  scatter- 
grams  of  a  subsample  of  60  stratified  by  age).  A  number  of  nutritive 
factors  exhibited  an  apparent  negative  regression  on  age.  Age  thus 
seemed  a  useful  covariate.  Other  factors,  such  as  education,  height, 
and  weight,  seemed  to  indicate  no  relation  to  nutritive  intake. 
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With  this  background  we  shall  use  these  data  to  undertake  an 
analysis  of  covariance  for  the  purpose  of  testing  hypotheses  about 
zone  and  income  group  effects  after  taking  account  of  the  regression 
on  age.  The  data  table  gives  the  54  pairs  of  observations  with  the 
sums,  sums  of  squares,  and  sums  of  products.  Both  zone  and  income 
group  may  be  considered  as  fixed  effects, 
(a)    Prepare  the  analysis  of  covariance  table. 
(6)    Find  the  error  regression  of  calories  on  age. 

(c)  Test  the  hypotheses  that  zone  and  income  effects  are  =  0  (sepa 
rately,  of  course)* 

(d)  It  would  also  be  of  interest  to  test  for  interaction  of  zone  and  in 
come  group.  What  do  you  conclude  on  this  point? 

(e)  The  regression  of  calories  on  age  may  not  be  homogeneous  over 
the  zones.  Indicate  by  a  schematic  analysis  of  variance  of  regres 
sion  how  you  would  examine  these  regressions. 


Zone 

1 

Zone 

2 

Zone 

3 

Income  Group 

Y 

X 

F 

X 

F 

X 

1  

1911 

46 

1318 

80 

1127 

74 

1560 

66 

1541 

67 

1509 

71 

2639 

38 

1350 

73 

1756 

60 

2  

1034 

50 

1559 

58 

1054 

83 

2096 

33 

1260 

74 

2238 

47 

1356 

44 

1772 

44 

1599 

71 

3  

2130 

35 

2027 

32 

1479 

56 

1878 

45 

1414 

51 

1837 

40 

1152 

59 

1526 

34 

1437 

66 

4  

1297 

68 

1938 

33 

2136 

31 

2093 

43 

1551 

40 

1765 

56 

2035 

59 

1450 

39 

1056 

70 

5  

2189 

33 

1183 

54 

1156 

47 

2078 

36 

1967 

36 

2660 

43 

1905 

38 

1452 

53 

1474 

50 

6  

1156 

57 

2599 

35 

1015 

63 

1809 

52 

2355 

64 

2555 

34 

1997 

44 

1932 

79 

1436 

54 

166416926 
—  CT     156053200 


-CT 


=  10363726 


4573454 
4773496 

—  200042 


—  CT 


157356 
146016 

*=  11340 
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CH  APTE  R    15 

DISTRIBUTION-FREE  METHODS 

IN  PRECEDING  CHAPTERS  the  emphasis  has  been  on  those  statistical 
techniques  which  assume  the  sampled  populations  to  be  of  known  form. 
However,  because  the  analyst  is  not  always  certain  of  the  validity  of 
such  assumptions  and/or  because  not  all  statistical  techniques  are 
robust  (i.e.,  insensitive  to  departures  from  such  assumptions),  much 
work  has  been  done  in  recent  years  to  devise  procedures  which  are  free 
of  these  restrictions.  These  new  techniques,  referred  to  as  distribution- 
free  methods -,1  will  be  the  subject  of  this  chapter. 

Although  most  distribution-free  methods  have  been  developed  only 
recently,  the  literature  in  this  area  is  already  quite  extensive.  Further, 
it  continues  to  grow  every  day.  Consequently,  it  will  be  impossible  to 
do  more  than  mention  a  few  of  the  more  popular  and  useful  methods 
in  this  book.  Those  persons  who  wish  to  delve  deeper  into  this  area  of 
statistics  are  encouraged  to  consult  the  references  listed  at  the  end  of 
this  chapter. 

15.1  DISTRIBUTION-FREE  METHODS   INCLUDED    IN 
PREVIOUS  CHAPTERS 

Four  widely  used  distribution-free  methods  have  already  been  intro 
duced  in  earlier  chapters.  These  are:  (1)  Tchebycheff's  inequality  dis 
cussed  in  Section  5.3,  (2)  the  distribution-free  tolerance  limits  referred 
to  in  Section  6.13,  (3)  the  chi-square  goodness  of  fit  test  described  in 
Section  7.15,  and  (4)  the  measures  of  rank  correlation  described  in 
Section  9.11.  Because  these  methods  were  discussed  in  the  afore 
mentioned  sections,  it  would  be  superfluous  to  repeat  their  descriptions 
at  this  time.  It  is  recommended,  however,  that  the  indicated  sections 
be  reread  in  the  present  context.  Let  us  now  proceed  to  the  study  of 
some  additional  distribution-free  methods  that  have  been  found  useful 
in  a  variety  of  situations. 

15.2  THE  SIGN  TEST 

In  many  experimental  situations,  the  investigator  wishes  to  compare 
the  effects  of  two  treatments.  When  the  data  occur  in  pairs,  one  mem 
ber  of  the  pair  being  associated  with  treatment  A  and  the  other  with 
treatment  B,  one  test  of  wide  applicability  is  the  sign  test.  Using  in 
equality  signs  to  denote  the  relationship  between  the  members  of  a 

1  Many  authors  refer  to  distribution-free  methods  as  nonparametric  methods 
and,  although  the  expressions  are  not  strictly  equivalent,  they  have  been,  and 
probably  will  continue  to  be,  used  interchangeably. 
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pair,  whether  the  comparison  be  qualitative  or  quantitative.2  the  sign 
test  proceeds  as  follows: 

(1)  Examine  each  of  the  pairs  (Xiy  Y£. 

(2)  If  Xi>  Yi?  assign  a  plus  sign;  if  X* <  Yif  assign  a  minus  sign;  if 
Xi  =  Yt,  discard  the  pair. 

(3)  Denote  the  number  of  pairs  remaining,  that  is,  the  number  of 
pairs  resulting  in  either  a  plus  or  minus  sign,  by  n. 

(4)  Denote  by  r  the  number  of  times  the  less  frequent  sign  occurs. 

(5)  To  test  the  hypothesis  of  no  difference  between  the  effects  of 
the  two  treatments,  compare  r  with  the  critical  values  tabu 
lated  in  Appendix  13. 

(6)  If  the  observed  value  of  r  is  less  than  or  equal  to  the  tabulated 
value  for  the  chosen  significance  level,  the  hypothesis  is  re 
jected;  otherwise,  it  is  not  rejected. 

Before  giving  numerical  illustrations,  it  is  appropriate  that  attention 
be  called  to  different  hypotheses  that  can  be  tested  in  the  manner 
indicated  above.  Some  of  these  are: 

(1)  Each  difference  X*  —  F;,  has  a  probability  distribution  (which 
need  not  be  the  same  for  all  differences)  with  median  equal  to 
0,  that  is,  H:P{Xi>  Y,}  =  0.5  for  all  i. 

(2)  If  the  underlying  distributions  are  assumed  to  be  symmetric, 
the  sign  test  may  be  used  to  test  the  hypothesis  flr:Mjr,  =  My./. 

(3)  If  it  can  be  assumed  that  the  underlying  distributions'  differ 
only  in  their  means,  then  a  test  of  Hip,x,=pY.  is  equivalent 
to  testing  the  hypothesis  that  the  probability  distributions  of 
each  pair  are  the  same. 

(4)  Questions  such  as: 

(a)  Is  A  better  than  B  by  P  per  cent? 
and 

(b)  Is  A  better  than  B  by  U  units? 

may  also  be  studied  by  applying  the  sign  test  to  the  differences 
D  =  A  -  (l+P/100) B  and  D  =  A  —  (B+  C7),  respectively. 

Example  15.1 

Consider  once  again  the  data  given  in  Table  7.6  and  discussed  in 
Example  7.21.  We  note  that  ™  =  15  and  r  =  4.  Assuming  <*  =  0.05,  it  is 
seen  that  the  hypothesis  of  equal  hardness  indications  by  the  two'  steel 
balls  cannot  be  rejected  since  the  critical  value  of  r  tabulated  in  Appen 
dix  13  was  3.  You  will  note  that  this  is  the  opposite  decision  to  that 
reached  in  Example  7,21.  The  reason  for  this  is  that,  when  normality 
can  be  assumed,  the  sign  test  is  less  efficient  (that  is,  less  sensitive)  than 
"Student's"  J-test. 

2  If  measurements  are  recorded,  then  X<  >  Yt  will  signify  that,  in  the  ith  pair, 
treatment  A  resulted  in  a  higher  reading  than  treatment  B.  If  no  measurements 
are  available,  then  Xi  >  Yi  will  signify  that,  in  the  fcth  pair,  treatment  A  resulted 
in  something  larger  than  Cor  better  than  or  preferred  over)  treatment  B. 
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Example  15.2 

In  a  marketing  study,  two  brands  of  lemonade  were  compared.  Each 
of  50  judges  tasted  two  samples,  one  of  brand  A  and  one  of  brand  B, 
with  the  following  results:  35  preferred  brand  A,  10  preferred  brand  J3, 
and  5  could  not  tell  the  difference.  Thus,  n  =  45  and  r  =  10.  Assuming 
CK=0.01,  we  reject  the  hypothesis  of  equal  preference  (since  r  =  10  <13 
=  critical  value)  and  conclude  that  brand  A  is  preferred. 

15.3     THE  SIGNED   RANK  TEST 

The  sign  test  described  in  Section  15.2  was  simple  to  apply.  However, 
when  measurement  data  have  been  obtained,  it  is  not  the  most  efficient 
distribution-free  test  available.  A  better  test,  sometimes  referred  to  as 
the  Wilcoxon  signed  rank  test  and  at  other  times  more  simply  as  the 
signed  rank  test,  is  one  which  takes  account  of  the  magnitude  of  the 
observed  differences.  It  proceeds  as  follows: 

(1)  Rank  the  differences  without  regard  to  sign,  that  is,  rank  the 
absolute  values  of  the  differences.  (The  smallest  difference  is 
given  rank  1  and  ties  are  assigned  average  ranks.) 

(2)  Assign  to  each  rank  the  sign  of  the  observed  difference. 

(3)  Obtain  the  sum  of  the  negative  ranks  and  the  sum  of  the 
positive  ranks. 

(4)  Denote  by  T  the  absolute  value  of  the  smaller  of  the  two 
sums  of  ranks  found  in  the  previous  step. 

(5)  To  test  the  hypothesis  of  no  difference  between  the  effects  of 
the  two  treatments,  compare  T  with  the  critical  values  tabu 
lated  in  Appendix  14. 

(6)  If  the  observed  value  of  T  is  less  than  or  equal  to  the  tabulated 
value  for  the  chosen  significance  level,  the  hypothesis  is  re 
jected;  otherwise,  it  is  not  rejected. 

Before  giving  numerical  illustrations,  we  should  note  that  the  signed 
rank  test  is  also  applicable  in  the  following  situations : 

(1)  To  test  the  hypothesis  that  the  median  of  a  population  is 
equal  to  some  specified  value,  say  M0. 

(2)  To  test  the  hypothesis  that  the  median  of  a  population  of  dif 
ferences  is  equal  to  some  specified  value,  say  M0. 

It  should  be  clear,  of  course,  that  in  Case  1  the  basic  variable  is 
\X  —  MO\,  while  in  Case  2  it  is  \(X—Y)—MQ\.  Apart  from  this 
obvious  transformation,  the  procedure  is  exactly  as  specified  above. 

Example  15.3 

Consider  again  the  data  of  Table  7.6.  These  are  reproduced  in  Table 
15.1  for  your  convenience.  Applying  the  procedure  for  the  signed  rank 
test,  it  is  seen  that  77  =  18.5.  Assuming  a:  =  0.05  and  consulting  Appendix 
14,  it  may  be  verified  that  T=18.5  <Tc  =  25,  and  therefore,  the  hy 
pothesis  of  equal  treatment  effects  is  rejected.  The  reader  should  com 
pare  this  result  with  those  reached  in  Examples  7.21  and  15.1. 
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TABLE  15.1-Data  Obtained  in  a  Brinell  Hardness  Test 
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Sample 
Number 

Differences 
(£>) 

Rank  of  |Z>| 

Signed  Rank 

Positive 

Negative 

1    

22 
2 
4 
12 
11 
15 
28 
—   5 
S 
4 
—    1 
—  10 
—   2 
25 
7 

13 
2.5 
4.5 
11 
10 
12 
15 
6 
8 
4.5 
1 
9 
2.5 
14 
7 

13 
2.5 
4.5 
11 
10 
12 
15 

2        .... 

3 

4     

5 

6 

7 

8   

—    6 

9 

8 
4.5 

10 

11 

—    1 
—    9 
-   2.5 

12 

13 

14 

14 
7 

15 

Total 

101.5 

—  18.5 

Example  15.4 

Given  the  data  of  Table  15.2,  test  the  hypothesis  that  the  population 
median  equals  12.  It  is  easily  seen  that  the  calculations,  also  shown  in 
Table  15.2  for  convenience,  lead  to  T=6  which,  for  n  =  8  and  oj=0.05, 
tells  us  we  are  unable  to  reject  the  stated  hypothesis. 


TABLE   15.2-Hypothetical  Data  To  Illustrate  the  Procedure 
of  the  Signed  Rank  Test 


"O  rt  -nTi-    s\f 

Signed 

Rank 

Ob  servations 

tx) 

X—  Mo 

jtvanK.  01 

\X-M*\ 

Positive 

Negative 

12   55 

0  55 

3 

3 

14  62 

2   62 

8 

8 

12   93 

0  93 

4 

4 

12  46 

0.46 

2 

2 

11   95 

—  0  05 

1 

—  1 

14  55 

2   55 

7 

7 

13   11 

1    11 

6 

6 

10  90 

—  1.10 

5 

—  5 

Total 

30 

—  6 
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15.4     THE   RUN  TEST 

Among  other  things,  the  theory  of  runs  may  be  used  to  test  the 
following  two  hypotheses: 

(1)  The  observations  have  been  drawn  at  random  from  a  single 
population. 

(2)  Two  random  samples  come  from  populations  having  the  same 
distribution. 

Because  the  mathematics  of  the  theory  of  runs  is  quite  involved,  we 
shall  do  no  more  than  sketch  the  approach.  Those  persons  who  need  to 
know  the  details  are  advised  to  consult  the  references  at  the  end  of 
the  chapter. 

Case  1 

(a)  List  the  observations  in  the  order  in  which  they  were  obtained, 
that  is,  in  their  order  of  occurrence. 

(b)  Determine  the  sample  median. 

(c)  Denote  observations  below  the  median  by  minus  signs  and 
observations  above  the  median  by  plus  signs. 

(d)  Denote  the  number  of  minus  signs  by  HI  and  the  number  of 
plus  signs  by  n^ 

(e)  Count  the  number  of  runs3  and  denote  this  number  by  r. 

(f)  If  r  is  less  than  or  equal  to  the  critical  value  tabulated  in 
Appendix  15  (Table  1)  or  greater  than  or  equal  to  the  critical 
value  tabulated  in  Appendix  15  (Table  2),  the  hypothesis  is 
rejected  at  the  5  per  cent  significance  level. 

Case  2 

(a)  List  the  r&i+n2  observations  from  the  two  samples  in  order  of 
magnitude,  that  is,  arrange  them  in  one  sequence  according  to 
their  values. 

(b)  Denoting  observations  from  one  population  by  x's  and  obser 
vations  from  the  other  population  by  j/'s,  count  the  number 
of  runs. 

(c)  Denote  the  observed  number  of  runs  by  r. 

(d)  If  r  is  less  than  or  equal  to  the  critical  value  tabulated  in 
Appendix  15  (Table  1),  the  hypothesis  is  rejected  at  the  5  per 
cent  significance  level, 

Example  15.5 

Suppose  a  manufacturing  process  is  turning  out  washers,  and  the 
characteristic  of  interest  is  the  outside  diameter.  In  the  first  40  washers 
tested,  there  were  16  runs  above  and  below  the  sample  median.  Noting 
that  ni  =  n2  =  20,  we  refer  to  Appendix  15  and  find  that  rI  =  14<16 
<28==ru-.  Thus,  at  the  5  per  cent  significance  level  we  are  unable  to  re 
ject  the  hypothesis  that  the  40  observations  constitute  a  random  sample 
from  a  single  population. 

3  In  terms  of  our  symbols,  a  run  is  a  sequence  of  signs  of  the  same  kind  bounded 
by  signs  of  the  other  kind. 
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Example  15.6 

Consider  the  data  of  Table  15.3.  Listing  according  to  ranks,  we  have: 
B,  AAAAAA,  B,  A,  B,  AA,  BBB,  A,  BBB}  A,  B}  A.  That  is  we  have 
r  =  12  runs.  In  addition,  7^  =  12  and  72^  =  10.  Reference  to  Appendix  15 
(Table  1)  tells  us  that,  since  r=12>rc  =  7?  we  are  unable  to  reject  the 
hypothesis  that  the  two  random  samples  came  from  populations  having 
the  same  distribution.  In  other  words,  using  the  run  test  and  operating 
at  the  5  per  cent  significance  level,  we  are  unable  to  reject  the  hypothesis 
that  the  two  lines  are  producing  equivalent  product. 

TABLE  15.3-Outside    Diameters    of    Washers    Produced 

by  Two  Different  Production  Lines  (Figures  in 

Parentheses  Are  the  Ranks) 


Line  A 

Line  B 

1.63  (6) 

1.65  (8) 

1  .  68  (9) 

1.69  (10) 

1  .  59  (4) 

1.72  (13) 

1.64  (7) 

1.91  (21) 

1.70  (11) 

1.74  (14) 

1.58  (3) 

1.75  (15) 

1.62  (5) 

1.55  (1) 

1.71  (12) 

1.86  (17) 

1.57  (2) 

1.87  (18) 

1.84  (16) 

1.88  (19) 

1.90  (20) 

1.96  (22) 

15.5     THE   KOLMOGOROV-SMIRNOV  TEST  OF 
GOODNESS  OF   FIT 

An  alternative  to  the  chi~square  goodness  of  fit  test  described  in 
Section  7.15  is  provided  by  the  Kolmogorov-Smirnov  test  to  be  de 
scribed  here.  Since  the  Kolmogorov-Smirnov  test  is  more  powerful 
than  the  ehi-square  test,  its  use  is  to  be  encouraged.  It  proceeds  as 
f  ollows : 

(1)  Let  F(x)  be  the  completely  specified  theoretical  cumulative 
distribution  function  under  the  null  hypothesis. 

(2)  Let  Sn(x)  be  the  sample  c.d.f .  based  on  n  observations.  For  any 
observed  x,  Sn(x)  =k/n  where  k  is  the  number  of  observations 
less  than  or  equal  to  x. 

(3)  Determine  the  maximtim  deviation,  D,  defined  by 

D  =  max   [  F(x)  —  Sn(x)  \  . 

(4)  If,  for  the  chosen  significance  level,  the  observed  value  of  D  is 
greater   than    or    equal    to    the    critical    value   tabulated    in 
Appendix  16,  the  hypothesis  will  be  rejected. 
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Example  15.7 

To  illustrate  the  Kolmogorov-Smirnov  test  of  goodness  of  fit,  we  shall 
apply  It  to  the  data  of  Table  7.11.  These  data  are  reproduced  here  as 
Table  15.4.  To  test  the  hypothesis  that  the  data  constitute  a  random 
sample  from  a  Poisson  population  with  a  mean  of  10.44,  calculations 
are  carried  out  as  shown  in  Table  15.4.  The  values  of  F(x)  were  found 
by  consulting  Appendix  2  and  using  A  =  10. 5.  (TTOTE:  If  a  more  precise 
evaluation  is  needed,  F(x)  should  be  determined  using  A  =  10. 44.  The 
approximate  value,  10.5,  was  used  since  A  =  10.44  would  require  interpo 
lation  in  Appendix  2.)  Since  £>  =  max  \F(x)  —  Sn (x)  \  =0.013  <  1.63 
/  VS75£  —  0.027,  the  hypothesis  may  not  be  rejected  at  the  1  per  cent 
significance  level.  The  reader  should  compare  this  result  with  that 
obtained  in  Example  7.27. 

TABLE   15. 4- Application  of  the  Kolmogorov-Smirnov  Goodness  of  Fit 
Test  to  the  Number  of  Busy  Senders  in  a  Telephone  Exchange 


Number 
Busy 

Observed 
Frequency 

Observed 
Curn  illative 
Frequency 

Relative 
Cumulative 
Frequency 
Sn(x) 

Expected 
Relative 
Cumulative 
Frequency 
FW 

\F(x*)-Sn(^\ 

0  

0 

0 

0 

0 

0 

1  

5 

5 

0.001 

o 

0.001 

2  

14 

19 

O.005 

0.002 

0.003 

3  

24 

43 

0.011 

0.007 

0.004 

4  

57 

100 

0.027 

0.021 

0.006 

5  

111 

211 

0.056 

0.050 

0.006 

6  

197 

408 

0.109 

0.102 

0  007 

7 

278 

686 

0   183 

0   179 

0  004 

8     ... 

378 

1064 

0.283 

0  279 

0  004 

9  

418 

1482 

0.395 

0  397 

0  002 

10 

461 

1943 

0.518 

0  521 

0  003 

11.  .  .    . 

433 

2376 

0.633 

0  639 

0  006 

12  

413 

2789 

0.743 

0.742 

0  001 

13  

358 

3147 

0.838 

0.825 

0  013 

14  

219 

3366 

0.897 

0.888 

0.009 

15  

145 

3511 

0.935 

0  932 

0  003 

16  

109 

3620 

0.964 

0.960 

0  004 

17 

57 

3677 

0  979 

0  978 

0  001 

18.    ., 

43 

3720 

0  991 

0  988 

0  003 

19  

16 

3736 

0  995 

0  994 

0  001 

20  

7 

3743 

0.997 

0.997 

o 

21  

8 

3751 

0.999 

0.999 

0 

22  

3 

3754 

1.000 

0.999 

O.OO1 

Data  Source:  Thornton  C.  Fry,  JProbability  and  Its  Engineering  Uses.  D.  Van  No  strand 
Company,  Inc.,  New  York,  1928,  p.  295. 
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15-6      MEDIAN   TESTS 

The  procedures  to  be  described  in  this  section  are  of  value  when  test 
ing  the  following  hypotheses; 

(1)  That   k  random   samples  were   drawn  from   identically   dis 
tributed  populations,  fc>2. 

(2)  That,  in  a  one-factor  experiment,  the  k  levels  of  the  factor 
have  the  same  effect. 

(3)  That,  in  a  two-factor  experiment,  (a)  the  a  levels  of  factor  a 
have  the  same  effect,  (b)  the  b  levels  of  factor  6  have  the  same 
effect,   and   (c)   there  is  no  true  interaction  between  factors 
a  and  6. 

For  our  purposes,  it  is  deemed  sufficient  to  concentrate  on  Case  1. 
Those  persons  wishing  to  investigate  Cases  2  and  3  are  referred  to 
Brown  and  Mood  (4,  5)  and  Mood  (22). 

If  k  random  samples  consisting  of  n\,  -  •  •  ,  nk  observations,  respec 
tively,  are  available,  determine  the  numbers  of  observations  in  each 
sample  that  are  above  and  below  the  median  of  the  combined  samples. 
These  data  may  then  be  analyzed  as  a  2Xfc  contingency  table  in  the 
manner  specified  in  Section  7.17. 

Example  15.8 

Consider  the  two  samples  in  Table  15.3.  Examination  of  these  data 
leads  to  the  2X2  contingency  table  shown  in  Table  15.5.  Using  Equa 
tion  (7.27),  we  obtain 

X2  -  22(|  (4) (3)  -  (8) (7)  |    -  11)Y(12)(10)(11)(11)  -  2.07. 

Since  x2  =  2.07  <X295(1)  =3.84,  we  are  unable  to  reject  the  hypothesis 
that  the  two  random  samples  were  drawn  from  identically  distributed 
populations. 

TABLE   15.5-Contingency  Table  Formed  From  the  Data 

of  Table  15.3 


Line  A 

Line  B 

Total 

Above  median  

4 

7 

11 

Below  median     .      ... 

8 

3 

11 

Total 

12 

10 

22 

Problems 

15.1      Apply  the  method  described  in  Section  15.2  to  the  following  problems: 

(a)  7.29 
(6)  7.32 
(c)  7.33 

Discuss  any  differences  between  the  method  used  here  and  that  used 
in  Chapter  7. 
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15.2  Apply  the  method  described  in  Section  15.3  to  the  following  problems: 

(a)  7.29 
(6)  7.32 
(c)  7.33 

Discuss  your  results  with  reference  to  those  obtained  in  Problem  15.1 
and  in  Chapter  7. 

15.3  Apply  the  method  described  in  Section  15.3  to  the  following  problems: 

(a)   7.1 
(6)   7.2(d) 
(c)    7.3 

State  any  changes  in  the  assumptions  and  the  wording  of  the  hy 
potheses  that  you  make.  Discuss  the  implications  of  these  changes. 
Compare  the  results  of  using  distribution-free  tests  with  those  ob 
tained  using  parametric  tests  in  Chapter  7. 

15.4  Apply  the  method  described  in  Section  15.4  to  the  data  given  in  the 
following  problems: 

(a)  6.1  (i)    7.30 

(6)  6.5  (/)    7.31 

(c)  6.20  (fc)    7.33 

(d)  6.22  (Z)    7.35 

(e)  6.23  (m)  7.36 
Of)  7.1  (n)   7.37 
(0)  7.2  (o)    7.39 
(A)  7.3  (p)   7.40 

In  each  case,  state  the  hypothesis  being  tested  and  specify  any  as 
sumptions  you  make. 

15.5  Apply  the  method  described  in  Section  15.5  to  check  on  the  assump 
tion  of  normality  in  the  following  problems: 

(a)    6.5 

(6)    7.1 

15.6  Apply  the  method  described  in  Section  15.6  to  the  data  given  in  the 
following  problems: 

(a)   7.30  (gr)  7.40 

(6)    7.31  (/O  11.9 

(c)  7.35  (0  11.10 

(d)  7.36  (j)  11.11 
(<0    7.37  (Jfe)  11.12 
CO    7.39 

Compare  the  conclusions  reached  here  with  those  reached  in  the  earlier 
chapters.  Discuss  any  discrepancies. 
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CHAPTER     16 

STATISTICAL  QUALITY  CONTROL 

SINCE  THE  EARLY  1940's  the  use  of  statistical  methods  in  industry  has 
been  on  the  upswing.  This  increased  use  of  statistics  has  been  par 
ticularly  noticeable  in  two  areas:  (1)  research  and  development  and  (2) 
reliability  and  quality  control.  Because  the  major  part  of  this  book 
has  been  concerned  with  those  statistical  methods  that  have  proved 
most  valuable  in  research  and  development,  this  chapter  will  be 
devoted  to  a  presentation  of  two  special  techniques  especially  useful 
for  controlling  quality  and  reliability.  (NOTE:  This  is  not  to  imply 
that  these  methods  are  of  no  value  to  those  persons  located  in  research 
and  development  organizations;  it  is  only  a  statement  of  relative 
importance.) 

The  two  techniques  not  discussed  in  the  preceding  chapters  and 
which  are  especially  useful  in  controlling  and  improving  quality  and 
reliability  are  control  charts  and  acceptance  sampling  plans.  Because 
these  two  techniques  are  discussed  at  great  length  in  books  devoted 
entirely  to  the  subject  of  statistical  quality  control,  only  a  brief  out 
line  of  each  will  be  given.  However,  the  material  to  be  presented  will 
be  sufficient  to  acquaint  the  reader  with  the  basic  concepts.  Those 
persons  in  need  of  more  detailed  explanations  are  referred  to  the  publi 
cations  listed  at  the  end  of  the  chapter. 

1 6.1       CONTRO  L«  CHARTS 

Although  control  charts  may  prove  useful  in  many  situations  they 
are  most  commonly  employed  in  the  analysis  and  control  of  production 
processes.  For  this  reason,  discussion  of  these  charts  will  be  in  terms 
perhaps  more  familiar  to  the  engineer  than  to  the  research  worker. 

It  has  long  been  recognized  that  some  variation  is  inevitable  in  any 
repetitive  process.  For  example,  Grant  (16)  states: 

Measured  quality  of  manufactured  product  is  always  subject  to  a  certain 
amount  of  variation  as  a  result  of  chance.  Some  stable  "system  of  chance 
causes'3  is  inherent  in  any  particular  scheme  of  production  and  inspection. 
Variation  within  this  stable  pattern  is  inevitable.  The  reasons  for  variation 
outside  this  stable  pattern  may  be  discovered  and  corrected.  .  1 

Taking  Grant  at  his  word,  what  we  seek,  then,  are  tests  for  detecting 
unnatural  patterns  in  the  plotted  data. 

The  control  chart,  as  conceived  and  developed  by  Shewhart  (21),  is 
"a  simple  pictorial  device  for  detecting  unnatural  patterns  of  variation 
in  data  resulting  from  repetitive  processes.  That  is,  the  control  chart 

1  E.  L.  Grant,  Statistical  Quality  Control,  Second  Edition,  McGraw-Hill  Book 
Company,  Inc.,  KTew  York,  1952,  p.  3. 
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provides  criteria  for  detecting  lack  of  statistical  control.  (NOTE: 
When  a  process  is  operating  under  a  constant  system  of  chance  causes, 
it  is  said  to  be  in  statistical  control.)  Rather  than  go  into  details  con 
cerning  the  theory  underlying  control  charts,  we  shall  be  content  with: 
(1)  indicating  the  proper  procedures  for  constructing  the  charts,  (2) 
stating  criteria  to  be  used  for  indicating  unnatural  patterns  of  vari 
ation,  and  (3)  giving  numerical  examples  of  the  four  most  commonly 
used  charts. 

Basically,  all  control  charts  appear  as  in  Figure  16.1.  The  sample 
points  are,  of  course,  plotted  in  a  sequential  manner,  that  is,  as  ob- 
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FIG-    16.1 -—Illustration    of  the  general  appearance   of  a   control   chart. 

tained.  The  plotted  points  are  joined  together  solely  as  an  aid  to  the 
visual  interpretation. 

ISTow,  what  tests  should  be  employed  to  detect  unnatural  patterns  in 
the  plotted  data?  Depending  on  the  degree  of  effectiveness  desired, 
there  are  many  criteria  that  might  be  used.  However,  since  our  sole 
purpose  is  to  indicate  the  basic  nature  of  the  control  chart  technique, 
only  the  most  common  tests  will  be  mentioned.  As  mentioned  earlier, 
those  persons  wishing  to  investigate  more  sophisticated  tests  should 
consult  the  references  at  the  end  of  the  chapter. 

The  most  common  tests  for  unnatural  patterns  are  tests  for  insta 
bility,  that  is,  tests  for  determining  if  the  cause  system  is  changing.  As 
commonly  employed,  they  refer  to  the  A,  B,  and  C  zones  shown  in 
Figure  16.2  With  reference  to  these  zones,  the  observed  pattern  of 
variation  is  said  to  be  unnatural,  or  the  process  is  said  to  be  "out  of 
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control/'  if  any  one  or  more  of  the  following  events  occurs:2 

(1)  A  single  point  falls  outside  of  the  control  limit;  i.e.,  beyond 
zone  A. 

(2)  Two  out  of  three  successive  points  fall  in  zone  A  or  beyond. 

(3)  Four  out  of  five  successive  points  fall  in  zone  B  or  beyond. 

(4)  Eight  points  in  succession  fall  in  zone  C  or  beyond. 

It  should  be  noted  that  the  above  tests  apply  to  both  halves  of  a  control 
chart  but  they  are  applied  separately  to  each  half,  not  to  the  two  halves 
in  combination. 

CONTROL    LIMIT 


ZONE    A 


ZONE    B 


ZONE   C 


^  -  CENTER     LINE 

FIG.   16.2—  Diagram  defining  the  A,  B,  and  C  zones  used  in  control 

chart  analyses.  (Each  of  the  zones.  A,  B,  and  C,  constitutes 

one-third    of   the    area    between    the    center    line 

and    the   control    limit.) 

Before  presenting  numerical  illustrations  of  control  chart  applica 
tions,  it  is  necessary  that  the  four  most  commonly  used  charts  be  intro 
duced  and  that  formulas  be  given  for  the  calculation  of  the  center  lines 
and  control  limits.  The  charts  most  often  encountered  are:  (1)  the  ~X 
chart,  (2)  the  R  chart,  (3)  the  p  chart,  and  (4)  the  c  chart.  The  first  two 
of  these  charts  deal  with  measurement  data  while  the  last  two  deal 
with  attribute  (enumeration)  data.  The  pertinent  assumptions  and  the 
formulas  for  the  control  limits  are  specified  in  Table  16.1. 

Example  16.1 

Consider  the  data  of  Table  16.2.  It  may  be  verified  that  If  —  T^SVfc 
=  213,20/20  =  10.66  and   R=*  J2R/k  =  31.  8/20  =  1.59.   Then,   using  the 


2  The  tests  described  here  may  be  used  when  the  two  control  limits  are  reason 
ably  symmetrically  located  with  respect  to  the  center  line.  If  the  limits  are  de 
cidedly  asymmetric,  the  tests  should  be  modified  as  described  in  (33). 
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TABLE  16.1— Assumptions  and  Formulas  for  the  Most  Commonly  Used 

Control  Charts 


Chart 

Assumed 
Distribution 

Center 
Line 

Upper  Control 
Limit  (UCL) 

Lower  Control 
Limit  (LCL) 

~x  

Normal 

*X 

X+A^R 

X—Aolt 

R  

Normal 

R 

D*R 

D*R 

Binomial 

P 

p+3\/p(l—p)/n 

p  —  3Vp(l  —  p)/n 


c  

Poisson 

~c 

Z-hS-x/f 

"c  —  3\/zF 

The  constants  A2,  Z>3y  and  D4  are  given  in  Appendix  8,  while  the  quantities  X,  R,  $, 
and  c  are  calculated  from  the  sample  data  as  shown  in  the  numerical  examples  which 
follow. 

formulas  given  in  Table  16.1,  we  see  that,  for  the  3T  chart,  UCL  =  10. 66 
+  (0.58) (1.59)  =  11,58  and  LCL  =  10.66-  (0.58) (1.59)  =9.74.  Similarly, 
for  the  R  chart,  UCL  =  (2.11)(1.59)  =3.35  and  LCL  -  (0)(1.59)  =0.  The 
resulting  charts  are  shown  in  Figure  16.3.  The  tests  for  unnatural 
patterns  have  been  applied  to  the  X  chart  and  the  potential  trouble 

TABLE  16.2-Coded  Values  of  the  Crushing  Strengths  of  Concrete  Blocks 


Sample 

Number 

Individual 

Values 

Mean 
(3) 

Range 

X,             X2             X,             X4             X5 

1  

11.1 

9.6 
9.7 
10.1 
12.4 

10.1 
11.0 
11.2 
10.6 
8.3 

10.6 
10.8 
10.7 
11.3 
11.4 

10.1 
10.7 
11.9 
10.8 
12.4 

9.4 
10.8 
10.0 
8.4 
10.0 

10.2 
11.5 
10.0 
10.4 
10.2 

9.9 
10.2 
10.7 
11.4 
11.2 

10.1 
12.8 
11.9 
12.1 
11.1 

11.2 
10.1 
10.0 
10.2 
10.7 

10.2 
11.8 
10.9 
10.5 
9.8 

10.7 
10.5 
10.8 
10.4 
11.4 

9.7 
11.2 
11.6 
11.8 
10.8 

10.4 
10.8 
9.8 
9.4 
10.1 

11.2 
11.0 
11.2 
10.5 
9.5 

10.2 
8.4 
8.6 
10.6 
10.1 

9.8 
11.2 
12.4 
9.4 
11.0 

10.1 
11.0 
10.4 
11.0 
11.3 

10.1 
11.3 
11.0 
10.9 
9.8 

11.4 
9.9 
11.4 
11.1 
11.6 

10.5 
11.3 
11.4 
11.6 
11.9 

10.44 
10.46 
9.98 
9.82 
10.90 

10.36 
11.32 
10.86 
10.58 
9.52 

10.56 
9,96 
10.44 
10.96 
11.14 

10.04 
11.44 
11.84 
11.14 
11.44 

1.8 
1.4 
0.7 
2.6 

2.4 

1.1 
0.8 
1.2 
0.5 
1.9 

1.5 

2.4 
2.8 
1.0 
1.5 

0.8 
2.1 
1.0 

2.7 
1.6 

2  

3  

4 

5  

6  

7  

8   

9.    . 

10   . 

11,  

12  

13  

14  

15 

16.,..  

17  

18 

19     

20  

Average 

10.66 

1.59 
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FIG.    16.3-Control  charts  for  the  data   of  Table    16.2. 

spots  tagged  in  the  customary  manner.  It  is  suggested  that  the  reader 
consider  the  application  of  these  tests  to  the  R  chart. 

Example  16.2 

Consider  the  data  of  Table  16.3.  Here  we  are  dealing  with  enumera 
tion  data,  namely,  the  number  of  defective  fuses  in  samples  of  size  50 
taken  at  random  times  during  the  production  process.  It  is  easily  verified 
that  p=  ^Cp/fc^1-68/40^0-042-  Usin£  the  formulas  specified  in  Table 
16.1,  we  obtain 

0.127 


UCL  =  0.042  +  3VC0.042)  (0.958) /SO  = 


and 


LCL  =  0.042  -  3V(0.042)(0.958)/50  =  -  0.043. 


(NOTE:  Since  the  formula  leads  to  a  negative  value  for  the  lower  control 
limit  and  because  the  fraction  defective  is  a  nonnegative  quantity,  the 
lower  control  limit  is  arbitrarily  set  at  0.  This,  of  course,  makes  the 
control  limits  asymmetric  with  respect  to  the  center  line,  The  tests  for 


TABLE  16.3-Number  of  Defective  Fuses  in 
Random  Samples  of  Size  50 


Sample  Number 

Number  of 
Defectives 

Fraction 
Defective  (p) 

1  

2 

0.04 

2  

1 

0.02 

3.  . 

2 

0.04 

4  

0 

0.00 

5  

2 

0.04 

6  

3 

0.06 

7  

4 

0.08 

8  

2 

0.04 

9.                            . 

0 

0.00 

10  

3 

0.06 

11  

0 

0.00 

12  

1 

0.02 

13  

2 

0.04 

14   

2 

0.04 

15  

3 

0.06 

16  

5 

0.10 

17    

1 

0.02 

18  

2 

0.04 

19  

3 

0.06 

20    

1 

0.02 

21 

1 

0  02 

22.    .  .. 

1 

0.02 

23  

4 

0.08 

24  

2 

0.04 

25  

2 

0  04 

26  

4 

0.08 

27  

1 

0.02 

28  

3 

0.06 

29  

3 

0.06 

30  

2 

0.04 

31  

3 

0.06 

32  

6 

0.12 

33  

2 

0.04 

34  

3 

0.06 

35  

2 

0.04 

36  

3 

0.06 

37  

1 

0.02 

38  

0 

0.00 

39  

2 

0.04 

40  

0 

0.00 

Average 

0.042 
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FIG.    16.4— Control   chart  for  the  data   of  Table    16.3, 

unnatural  patterns  specified  earlier  have,  therefore,  not  been  applied. 
The  application  of  the  modified  tests  is  left  as  an  exercise  for  the  reader. 
The  results  of  the  preliminary  analysis  are  presented  in  Figure  16.4.) 

Example  16.3 

Consider  now  a  situation  in  which  the  characteristic  of  interest  is  the 
number  of  defects  per  unit.  In  such  a  case,  a  Poisson  distribution  would 
undoubtedly  be  assumed  and  a  c  chart  would  be  appropriate.  Given  the 
data  in  Table  16.4,  it  is  easily  verified  that  c=  T^c/fc  =  144/24  =  6. 
Using  the  formulas  specified  in  Table  16.1,  we  obtain  UCL==6  +  3VB 
=  13.35  and  LCL  =  6  —  3v^=  —1.35.  (NOTE:  As  in  Example  16.2,  the 
negative  lower  control  limit  will  arbitrarily  be  changed  to  0  since  c  is, 
by  definition,  a  nonnegative  quantity.)  The  results  of  the  analysis  are 
plotted  in  Figure  16.5. 


12345  1O  15  2O  25 

SAMPLE    NUMBER 

FIG.   16.5— Control  chart  for  the  data  of  Table  16.4. 
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TABLE   16.4r-Number  of  Defects  Observed  in  a  Welded  Seam  (Each  Count 
Taken  on  a  Single  Seam;  the  Welder  produced  8  Seams  per  Hour) 


Sample 
Number 

Date 

Time  of 
Sample 

Number  of 
Defects  (c) 

1  

July  18 

8:00  A.M. 

2 

2  

9:05  A.M. 

4 

3  

10:10  A.M. 

7 

4  

11  :00  A.M. 

3 

5  

12  :30  P.M. 

1 

6  

1  :35  P.M. 

4 

7  

2:20  P.M. 

8 

8  

3:30  P.M. 

9 

9  

July  19 

8:10  A.M. 

5 

10  

9:00  A.M. 

3 

11  

10:05  A.M. 

7 

12  

11:15  A.M. 

11 

13  

12:25  P.M. 

6 

14  

1  :30  P.M. 

4 

15 

2  :30  P.M. 

9 

16 

3:40  P.M. 

9 

17  

July  20 

8:00  A.M. 

6 

18  

8:55  A.M. 

4 

19  

10:00  A.M. 

3 

20 

11:10  A.M. 

9 

21  ... 

12:25  P.M. 

7 

22  

1  :30  P.M. 

4 

23  

2:20  P.M. 

7 

24  

3:30  P.M. 

12 

Total 

144 

Source:  E.  L.  Grant,  Statistical  Quality  Control,  Second  Edition,  McGraw-Hill  Book 
Company,  Inc.,  New  York,  1952,  p.  33.  By  permission  of  the  author  and  publishers. 

Examination  of  Figure  16.5  will  reveal  that  some  modifications  have 
been  made  in  the  usual  method  of  presentation,  namely,  the  various 
half-days  have  been  separated  (i.e.,  not  connected)  to  emphasize  the 
breaks  between  work  periods.  Now,  it  will  be  noted  that  two  conclusions 
are  obvious:  (1)  no  points  plot  above  the  upper  control  limit,  and  (2) 
essentially  the  same  pattern  appears  in  each  half-day.  Study  of  the 
recurring  pattern  suggests  the  existence  of  a  fatigue  factor.  [NOTE:  For 
further  discussion  of  this  example,  consult  Grant  (16,  pp.  32—35),] 

Before  leaving  the  topic  of  control  charts,  some  additional  remarks 
are  necessary.  These  will,  however,  be  very  brief  and  in  no  particular 
order. 
1.   The  primary  reason  for  using  control  charts  is  to  provide  a  signal 

that  some  action  is  desirable. 
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2.  Before  control  limits  may  be  calculated  with,  any  assurance  of  their 
being  reliable,  at  least  20  subgroups  (samples)  should  be  available. 

3.  Before  the  control  limits,  calculated  from  past  production  records, 
are  used  to  monitor  future  production,  the  process  should  be  in 
control. 

4.  If  a  process  is  in  control  and  a  point  plots  outside  the  control  limits, 
the  taking  of  action  may  be  viewed  as  committing  a  Type  I  error. 

5.  The  control  chart  is  not  a  panacea  for  all  production  problems;  it  is 
only  another  useful  tool. 

16.2      ACCEPTANCE  SAMPLING   PLANS 

Acceptance  sampling,  or  sampling  inspection,  is  of  two  types :  lot-by- 
lot  sampling  and  sampling  of  continuous  production.  In  this  brief  ex 
posure  to  the  concepts  and  procedures  of  acceptance  sampling,  only  the 
first  of  these  two  types  will  be  discussed.  Persons  desiring  information 
on  continuous  sampling  plans  are  referred  to  the  publications  listed  at 
the  end  of  the  chapter.  In  addition  to  the  distinction  between  lot-by- 
lot  and  continuous  sampling,  it  is  customary  to  classify  sampling  plans 
as  either  attributes  or  variables  plans.  Attributes  plans  refer  to  those 
cases  in  which  eacli  item  is  classified  simply  as  eitlier  defective  or  non- 
defective  ;  variables  plans  refer  to  those  cases  in  which  a  measurement 
is  taken  and  recorded  (numerically)  on  each  item  inspected.  In  this 
section  only  attributes  plans  will  be  considered.  There  is  one  other  way 
in  which  acceptance  sampling  plans  may  be  classified,  namely,  as 
single,  double,  or  multiple  (including  sequential}  plans.  These  categories 
refer,  of  course,  to  the  number  of  samples  selected  and  will  become 
clear  as  the  exposition  continues. 

An  attributes  single  sampling  plan  which  operates  on  a  lot-by-lot  basis 
is  completely  defined  by  three  numbers:  the  lot  size,  AT;  the  sample 
size,  n;  and  the  acceptance  number,  a.3  Such  a  plan  operates  as  follows: 

(1)  A  single  sample  of  n  items  is  selected,  by  chance,  from  a  lot  of 
N  items. 

(2)  Each  item  in  the  sample  is  then  classified  as  either  defective 
or  nondefective. 

(3)  If  the  number  of  defective  items  in  the  sample  does  not  exceed 
a,  the  lot  is  accepted. 

(4)  If  the  number  of  defective  items  in  the  sample  exceeds  a,  the 
lot  is  rejected. 

As  with  all  statistical  procedures,  the  risks  associated  with  decisions 
(inferences)  resulting  from  sampling  inspection  must  be  assessed.  The 
customary  manner  of  presenting  these  risks  is  by  means  of  a  graph  of 
the  OC  function  of  the  sampling  plan,  that  is,  by  plotting  the  prob 
ability  of  accepting  the  lot  as  a  function  of  the  fraction  defective  of  the 
lot.  The  protection  afforded  by  various  sampling  plans  may  then  be 
compared  by  examining  their  OC  curves. 

3  In  many  publications,  the  acceptance  number  is  denoted  by  c.  However,  I 
prefer  a  for  two  reasons:  it  stands  for  the  word  acceptance  and  it  permits  an 
easier  extension  to  double  and  multiple  sampling. 
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For  the  single  sampling  plan  specified  previously,  the  OC  function  is 


=          C(D,  d}-C(N  -  D,n-  d}/C(N,  n)  (16. 


where  D  represents  the  number  of  defective  items  in  the  lot  and  d 
represents  the  number  of  defective  items  in  the  sample.  Equation  (16.1) 
may,  of  course,  be  evaluated  for  Z)  =  0,  1,  -  -  *  ,  N  and  the  results 
plotted  as  a  series  of  ordinates  erected  at  the  corresponding  values  of 
p  =  D/N,  namely:  0,  1/-ZV,  2/N,  •  •  •  ,  1.  It  is  clear,  however,  that  these 
calculations  may  prove  onerous  unless  a  high-speed  computer  is  avail 
able.  Fortunately,  helpful  tables  of  the  hypergeometric  function  have 
been  provided  by  Lieberman  and  Owen  (19).  In  addition,  extensive 
catalogs  of  OC  curves  for  lot-by-lot,  single  sampling  (by  attributes) 
plans  have  been  prepared  by  Wiesen  (34)  and  Clark  and  Koopmans  (9) . 
If  none  of  these  publications  is  readily  available  and  access  to  a  high 
speed  computer  is  impossible,  the  OC  function  may  be  approximated  by 

C(n,  d)pd(l  -  p^~d  (16.2) 

d=Q 

or 


Pace  S*  2L,  e-^\d/d\  (16.3) 

where  X  =  up.  (NOTE :  The  reader  is  referred  to  Chapter  5  for  discussion 
of  the  accuracy  and  relevancy  of  these  approximations.) 

Persons  familiar  with  the  presentation  of  OC  curves  for  sampling 
plans,  or  those  individuals  who  consult  the  references  given  in  the 
preceding  paragraph,  will  realize  that  it  is  not  customary  to  plot  OC 
functions  as  a  series  of  ordinates  (as  stated  in  the  preceding  paragraph) 
but  to  show  smooth  OC  curves.  That  is,  the  functions  are  plotted  as 
though  p  were  a  continuous  parameter.  For  lot-by-lot  plans,  this 
minor  "tampering  with  the  truth"  is  not  serious  and  the  resulting  gain 
in  ease  of  presentation  far  outweighs  the  inaccuracy  of  the  graph. 
Consequently,  OC  curves  usually  appear  as  in  Figure  16.6. 

Rather  than  plot  the  entire  OC  function,  the  practitioner  frequently 
contents  himself  with  calculating  two  or  three  points  on  the  curve.  The 
three  points  most  commonly  determined  are: 

Pi  =  the  value  of  p  for  which  Pacc  =  0.95, 

p%  =  the  value  of  p  for  which  Pacc  =  0.50,  and 

Pz  =  the  value  of  p  for  which  Pacc  =  0.10. 

These  three  values  of  p  (that  is,  pi,  p2,  and  z>3)  are  usually  referred  to  as 
the  acceptable  quality  level  (AQL),  the  indifference  quality }  and  the 
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PRODUCER'S    RISK 


:ONSUMERIS     RISK 


AQL  RQL 

p=LOT    FRACTION    DEFECTIVE 

FIG.    1 6.6— Illustration   of  the  general   appearance  of  an   OC   curve, 

(To    aid    in    understanding    the    concepts,    the   AQL,    RQL,    consumer's 

risk,    and    producer's    risk    are    shown    in    this    figure.) 

rejectable  quality  level  (RQL),4  respectively.  The  AQL  and  RQL  points, 
as  well  as  the  associated  expressions,  consumer's  risk  and  producer's 
risk  have  been  shown  on  Figure  16.6.  (NOTE  :  If,  in  the  definitions  of  pa 
and  j>3,  the  values  0.95  and  0.10  are  replaced  by  1  —  a.  and  0,  respec 
tively,  the  reader  will  immediately  see  the  close  connection  between 
the  ideas  of  this  section  and  those  discussed  in  Chapter  7.  Incidentally, 
the  same  remark  may  be  made  here  as  there,  namely,  the  values  as 
signed  to  a,  and  0  are  arbitrary;  the  use  of  «  =  0.05  and  £  =  0.10  is  only 
a  matter  of  custom.) 

One  other  common  way  of  presenting  the  performance  ability  of 
acceptance  sampling  plans  is  to  calculate  and  plot  the  average  outgoing 
quality  (AOQ)  function.  This  function,  restricted  to  cases  in  which  the 
testing  is  nondestructive,  depends  on  the  assumption  that  all  rejected 
lots  are  submitted  to  100  per  cent  inspection,  with  all  defective  items 
being  removed  and  replaced  by  nondefective  items.  Under  this  as 
sumption,  the  average  outgoing  quality  is  determined  to  be 


AOQ 


(16.4) 


4  Historically,  the  rejectable  quality  level  (RQL)  has  been  known  as  the  lot 
tolerance  per  cent  defective  (LTPD).  However,  in  recent  years,  the  expressions 
rejectable  quality  level  and  objectionable  quality  level  (OQL)  have  been  suggested, 
and  I  believe  that  rejectable  quality  level  will  soon  be  universally  adopted.  For 
this  reason,  it  is  used  in  this  book. 


488 


CHAPTER    16,   STATISTICAL   QUALITY    CONTROL 


where  Pacc  is  defined  by  Equation  (16.1).  The  graph  of  a  typical  AOQ 
function  is  shown  in  Figure  16.7.  The  maximum  value  of  the  average 
outgoing  quality  is  known  as  the  average  outgoing  quality  limit  (AOQL). 
From  a  practical  point  of  view,  the  AOQL  is,  perhaps,  the  most  im 
portant  descriptive  measure  associated  with  any  acceptance  sampling 
plan.  [NOTE:  AOQ  curves  are  also  included  in  the  catalogs  of  Wiesen 
(34)  and  Clark  and  Koopnaans  (9) .  ] 

AOQ 


AOQL 


O 


P*LOT      FRACTION      DEFECTIVE 
FIG.   16.7—  Illustration  of  the  general  appearance  of  an  AOQ  curve. 

Let  us  now  consider  double  sampling  plans.  An  attributes  double 
sampling  plan  which  operates  on  a  lot-by-lot  basis  is  completely  defined 
by  six  numbers:  the  lot  size,  N;  the  size  of  the  first  sample,  ni;  the 
acceptance  number  associated  with  the  first  sample,  a\\  the  rejection 
number  associated  with  the  first  sample,  rij  the  size  of  the  second 
sample,  n^*  and  the  acceptance  number  associated  with  the  combined 
samples,  a2.  Such  a  plan  operates  as  follows: 

(1)  A  sample  of  n±  items  is  selected,  by  chance,  from  a  lot  of  N 
items. 

(2)  Each  item  in  the  sample  is  then  classified  as  either  defective  or 
nondef  ective . 

(3)  If  the  number  of  defective  items  in  the  sample  does  not  exceed 
ai,  the  lot  is  accepted. 

(4)  If  the  number  of  defective  items  in  the  sample  equals  or  ex 
ceeds  ri,  the  lot  is  rejected. 

(5)  If  the  number  of  defective  items  in  the  sample  exceeds  ai  but 
is  less  than  ri,  a  second  sample  (of  n2  items)  is  selected,  by 
chance,  from  the  remainder  of  the  lot. 

(6)  If  the  number  of  defective  items  in  the  two  samples  combined 
does  not  exceed  a2,  the  lot  is  accepted. 

(7)  If  the  number  of  defective  items  in  the  two  samples  combined 
exceeds  a*,  the  lot  is  rejected. 

Rather  than  go  into  the  same  detail  as  was  done  for  single  sampling, 
only  the  bare  essentials  will  be  presented.  The  OC  function  is 
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r-l 

C(D,  dd-C(N  -  D,  m  -  dd/C(N,  «0  +      2^     C(D,  d$ 


•C(N  —  D,n±- 

d2=0 

•C(N  —  ni—  D  +  di,  n*  —  d$/C(N  —  nly  n%)  (16.5) 

where  di  represents  the  number  of  defective  items  in  the  first  sample 
and  d2  represents  the  number  of  defective  items  in  the  second  sample. 
Subject  to  the  usual  restrictions,  this  may  be  approximated  by 


C(n2?  J2)^(l  —  p)-2-^2  (16.6) 

^2=0 

or 

ai  ri~1  az~di 

Paoc^  2  «-XlxtV^il+     12     «-XlxtV^iI    S   «r-x*xtV^2l         (16.7) 

di=0  d1=a14-l  ^2=0 

where  \i  =  n^p  and  \2=:n2p.  As  before,  the  average  outgoing  quality  is 
given  by 

AOQ  =  p-Pacc  (16.8) 

where  Pocc  is  defined  by  Equation  (16.5).  In  addition,  since  the  total 
sample  size  is  now  a  variable,  it  is  also  possible  to  assess  the  relative 
"costs"  of  sampling  plans  by  comparing  their  expected  sample  sizes. 
To  summarize,  it  is  customary  to  calculate  the  average  sample  number 
(ASN).  Proceeding  on  the  assumption  that  every  item  in  each  sample 
is  inspected,  the  average  sample  number  for  a  double  sampling  plan  is 
given  by 


ASN  =  m  +  n2         3     C(D,dJ-C(N  —  Z>,  »i  —  d^/C(N,n^.      (16.9) 

<Zl*=ai+l 

The  graph  of  a  typical  ASN  function  for  a  double  sampling  plan  is 
shown  in  Figure  16.8,  where  it  is  compared  with  the  constant  sample 
size  of  an  equivalent  (in  terms  of  the  OC  function)  single  sampling 
plan. 

Before  proceeding  to  multiple  sampling  plans,  it  seems  desirable  to 
list  the  advantages  and  disadvantages  of  double  sampling  plans  rela 
tive  to  single  sampling  plans.  These  are: 

Advantages 

(1)   They  have  the  psychological  advantage  of  giving  doubtful 
lots  a  second  chance. 
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(2)    On  the  average,  they  have  (for  the  two  extremes  of  good  and 
bad  quality)  the  advantage  of  requiring  fewer  inspections. 

Disadvantages 

(1)  They  are  said  to  be  more  difficult  to  administer.  (Personally, 
I  do  not  subscribe  to  this  point  of  view.) 

(2)  The  inspection  load  is  variable. 

(3)  The  maximum  number  of  inspections  can   (for  intermediate 
quality)  exceed  that  for  comparable  single  sampling  plans. 


ASN 


DOUBLE 
SAMPLING 


n 


SINGLE 
SAMPLING 


0  p  =  LOT    FRACTION      DEFECTIVE 

FIG.    1  6.8— Illustration  of  the  general  appearance  of  ASN  functions 

for    single    and    double    sampling    plans    that    exhibit 

essentially   the    same    OC    curve. 

Since  there  is  little,  if  anything,  new  in  multiple  sampling  plans  that 
has  not  been  discussed  in  connection  with  single  and  double  sampling 
plans,  the  following  remarks  will  be  very  brief.  If  one  extends  the  con 
cepts  and  procedures  of  double  sampling  to  k  >  2  samples,  it  may  easily 
be  seen  that  such  plans  are  completely  defined  by  the  numbers:  AT, 
ni,  •  -  •  ,  nk,  ai,  -  -  -  ,  a*,  ri,  •  -  -  ,  rk  =  a,k+l'  Proceeding  in  a  manner 
analogous  to  that  followed  for  double  sampling  plans,  one  may  obtain 
OC,  AOQ,  and  ASN  functions.  [NOTES:  (1)  The  calculations  will, 
however,  be  quite  involved  and  lengthy.  (2)  If  each  n,-=  1  and  if  k — >  oo  9 
the  multiple  plan  is  usually  referred  to  as  a  sequential  plan.]  For  those 
persons  desiring  more  details  on  multiple  and  sequential  acceptance 
sampling  plans,  the  appropriate  references  at  the  end  of  the  chapter 
are  recommended. 

Example  16.4 

Suppose  that  lots  of  size  100  are  submitted  for  acceptance.  A  sample 
of  size  2  is  drawn  from  a  lot  and  the  lot  is  accepted  if  both  the  sample 
items  are  nondefective;  if  one  or  both  are  defective,  the  lot  is  rejected. 
The  OC  function  for  this  plan  is 

Pace  =  CCA  0)-C(100  -  D,  2)/CC100,  2) 
and  this  may  be  evaluated  for  Z>  =  0,  1,  •  -  -  ,  100. 
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Example  16.5 

With  reference  to  Example  16.4,  it  is  easily  verified   that   the  AOQ 
function  is 

AOQ  =  p-P^c  -  [Z>/100][C(D,0)-C(100  -  D,  2)/C(1003  2)] 
where,  once  again,  this  may  be  evaluated  for  Z>  =  0,  1,  •  *  •  ,  100. 

Problems 

16.1  Define  and  discuss  each  of  the  following  terms:  (a)  quality,  (b)  con 
trol,  (c)  quality  control,  (d)  statistical  quality  control. 

16.2  Why  should  both  the  ~5c  chart  and  the  R  chart  be  used  when  dealing 
with  measurement  (as  opposed  to  attributes)  data? 

16.3  A  control  chart  for  ~X  has  the  upper  control  limit  (UCL)  and  the 
lower  control  limit   (LCL)    equal  to  24.23   and  23.99,   respectively. 
The  calculations  were  based  on  samples  of  size  four.  If  the  engineering 
specifications  had  been  given  as  24.10  ±0.20,  what  percentage  of  the 
sample  means  would  you  expect  to  plot  out  of  control?  State  any 
assumptions  you  find  it  necessary  to  make. 

16.4  Given  the  following  data,   construct  and  interpret  the  appropriate 
control  charts. 

SAMPLE  MEANS  AND  RANGES  (w— 10)  FOR  LENGTH 

OF  LIFE  OF  LIGHT  Bulges 
(CODED  DATA) __ 

Sample  No.  Mean  (5T)  Range  (R) 


1 

69.4 

45 

2 

63.4 

48 

3 

55.0 

72 

4 

64.0 

48 

5 

57.4 

36 

6 

82.0 

81 

7 

85.0 

78 

8 

33.4 

42 

9 

46.0 

69 

10 

112.4 

84 

11 

93.8 

48 

12 

95.6 

75 

13 

117.8 

51 

14 

113.6 

84 

15 

74.8 

54 

16 

80.8 

45 

17 

71.8 

57 

18 

53.2 

75 

19 

74.8 

48 

20 

59.2 

63 

21 

65.8 

129 

22 

109.6 

42 

23 

44.2 

51 

24 

73.6 

51 

25 

51.4 

27 

Total  1848.0  1503 
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16.5  Given  the  following  sample  data,  construct  control  charts  for  the 
sample  mean  and  the  sample  range,  that  is,  X  and  R  charts,  and 
interpret  the  results. 


Sample  

1 

2 

3 

4 

5 

6 

7 

8 

9 

Individual  values.  . 

+  2 

0 

+3 

+  1 

—  1 

—  1 

0 

0 

+  2 

—  1 

0 

+  1 

-3 

0 

0 

0 

+  2 

0 

+  1 

—  3 

+4 

0 

+  1 

—  1 

+  3 

—  2 

0 

0 

+2 

—  2 

+  1 

0 

—  2 

—  1 

—  «—  ^ 

+  1 

0 

—  3 

—  1 

—  1 

+1 

—  3 

+  3 

0 

+  1 

Sample    

10 

11 

12 

13 

14 

15 

16 

17 

18 

Individual  values.  . 

0 

+  1 

0 

—  2 

+  2 

+  2 

0 

0 

—  2 

+  3 

—2 

0 

—  3 

—  1 

—  2 

+4 

0 

—  1 

+2 

+  1 

—  2 

+  2 

0 

+  1 

—  4 

—  1 

+2 

+  1 

+  1 

—  2 

—  1 

—  2 

—  1 

0 

0 

—  1 

+  1 

+3 

0 

—  5 

—  1 

0 

+3 

+  2 

0 

Sample      

19 

20 

21 

22 

23 

24 

25 

Individual  values  .  . 

—  2 

+2 

0 

0 

—  1 

0 

+  1 

0 

—  2 

—  1 

+  2 

—  1 

+  1 

—  1 

—  1 

-1 

—  1 

+  1 

+  2 

+  1 

—  4 

—  1 

+  2 

—  1 

+  1 

_  i 

—  3 

—  1 

0 

—  1 

+  1 

+  1 

+  1 

—  2 

+  1 
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16.6        Construct  and  interpret  the  appropriate  control  charts  for  the  fol 
lowing  data. 


Group  No. 

(-)      (V      (*>      (d)      (e) 

X 

R 

1 

.831 

.829 

.836 

.840 

.826 

.8324 

.014 

2 

.834 

.826 

.831 

.831 

.831 

.8306 

.008 

3 

.836 

.826 

.831 

.822 

.816 

.8262 

.020 

4 

.833 

.831 

.835 

.831 

.833 

.8326 

.004 

5 

.830 

.831 

.831 

.833 

.820 

.8290 

.013 

6 

.829 

.828 

.828 

.832 

.841 

.8316 

.013 

7 

.835 

.833 

.829 

.830 

.841 

.8336 

.012 

8 

.818 

.838 

.835 

.834 

.830 

.8310 

.020 

9 

.841 

.831 

.831 

.833 

.832 

.8336 

.010 

10 

.832 

.828 

.836 

.832 

.825 

.8306 

.011 

11 

.831 

.838 

.844 

.827 

.826 

.8332 

.018 

12 

.831 

.826 

.828 

.832 

.827 

.8288 

.006 

13 

.838 

.822 

.835 

.830 

.830 

.8310 

.016 

14 

.815 

.832 

.831 

.831 

.838 

.8294 

.023 

15 

.831 

.833 

.831 

.834 

.832 

.8322 

.003 

16 

.830 

.819 

.819 

.844 

.832 

.8288 

.025 

17 

.826 

.839 

.842 

.835 

.830 

.8344 

.016 

18 

.813 

.833 

.819 

.834 

.836 

.8270 

.023 

19 

.832 

.831 

.825 

.831 

.850 

.8338 

.025 

20 

.831 

.838 

.833 

.831 

.833 

.8332 

.007 

21 

.823 

.830 

.832 

.835 

.835 

.8310 

.012 

22 

.835 

.829 

.834 

.826 

,828 

.8304 

.009 

23 

.833 

.836 

.831 

.832 

.832 

.8328 

.005 

24 

.826 

.835 

.842 

.832 

.831 

.8332 

.016 

25 

.833 

.823 

.816 

.831 

.838 

.8282 

.022 

26 

.829 

.830 

.830 

.833 

.831 

.8306 

.004 

27 

.850 

.834 

.827 

.831 

.835 

.8354 

.023 

28 

.835 

.846 

.829 

.833 

.822 

.8330 

.024 

29 

.831 

.832 

,834 

.826 

.833 

.8312 

.008 
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16.7  Using  the  data  below,  calculate  limits  and  plot  the  ~X  and  R  charts. 
Apply  the  standard  tests  for  unnatural  patterns  and  discuss  the 
results.  (NOTE:  The  sample  size  is  7^  =  5.) 

X  R  X  R 


1 

1.444 

.09 

26 

1.424 

.05 

2 

1.427 

.08 

27 

1.434 

.05 

3 

1.464 

.08 

28 

1.414 

.09 

4 

1.455 

.08 

29 

1.406 

.07 

5 

1.462 

.10 

30 

1.418 

.14 

6 

1.448 

.05 

31 

1.438 

.09 

7 

1.454 

.04 

32 

1.416 

.07 

8 

1.446 

.08 

33 

1.419 

.06 

9 

1.437 

.12 

34 

1.406 

.08 

10 

1.471 

.11 

35 

1.428 

.06 

11 

1.438 

.09 

36 

1.430 

.06 

12 

1.438 

.05 

37 

1.421 

.07 

13 

1.415 

.12 

38 

1.434 

.07 

14 

1.428 

.12 

39 

1.408 

.05 

15 

1.425 

.08 

40 

1.414 

.08 

16 

1.440 

.09 

41 

1.410 

.03 

17 

1.430 

,05 

42 

1.406 

.06 

18 

1.457 

.09 

43 

1  .405 

.07 

19 

1.444 

.09 

44 

1.419 

.10 

20 

1.432 

.05 

45 

1.410 

.07 

21 

1.438 

.05 

46 

1.420 

.05 

22 

1.404 

.10 

47 

1.414 

.05 

23 

1.409 

.05 

48 

1.426 

.11 

24 

1.400 

.07 

49 

1.386 

.06 

25 

1.425 

.09 

50 

1.387 

.08 
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16.8  Following  are  the  number  of  defective  piston  rings  inspected  in  23 
daily  samples  of  100.  Calculate  the  control  limits  for  the  type  of 
control  chart  that  should  be  used  with  these  data.  Interpret  the 
results. 


Date 

Number  Defective 

1    

9 

2  

5 

3.           

10 

4  

10 

5                    .... 

13 

8  

10 

9  

13 

10        

2 

11  

1 

12      .            

3 

15  

2 

16  

2 

Date 

Number  Defective 

17...           .          .     . 

3 

18 

6 

19  

3 

22 

3 

23  

3 

24 

0 

25  

5 

26  

5 

29 

3 

30  

2 

31                      .  .  .     . 

4 
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16.9        Given  the  following  data,  construct  the  appropriate  control  chart. 
Interpret  the  results. 

NTJMBEU  or  DEFECTIVES  IN  SAMPLES  or  SIZE  100 


Sample  No. 

Number  Defective  (cf) 

1  

3 

2       .              .... 

1 

3  

4 

4  

4 

5. 

4 

6  

6 

7  

5 

8 

5 

9  

2 

10  

4 

11.  . 

3 

12  

4 

13  

4 

14.  . 

3 

15 

5 

16  

8 

17  

2 

18 

3 

19  

5 

20  

4 

21  

3 

22  

4 

23  

6 

24  

4 

25  

3 

26  

5 

27  

4 

28  

7 

29  

6 

30  

5 

31  

5 

32  

6 

33  

4 

34  

9 

35  

6 

36  

4 

37  

3 

38  

1 

39  

3 

40  ;    .     . 

1 
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16.10 


At  a  certain  point  in  the  assembly  process,  TV  sets  are  subjected  to 
a  critical  inspection.  The  following  data  resulted  from  the  inspection 
of  25  randomly  selected  sets.  Plot  and  interpret  the  appropriate  con 
trol  chart. 


Set  No. 

Number  of  Defects 
per  Set 

Set  No. 

Number  of  Defects 
per  Set 

1  

12 

14 

7 

2  

11 

15 

4 

3  

13 

16 

9 

4  

17 

17 

15 

5  

7 

18 

12 

6  

10 

19 

11 

7  

9 

20 

9 

8  

8 

21 

4 

9  

13 

22 

4 

10  

15 

23 

10 

11  

18 

24 

12 

12  

9 

25 

4 

13  

14 

16.11  Phonograph  records  were  selected  at  random  times  from  a  production 
line,  Given  the  following  data,  construct  and  interpret  the  appropri 
ate  chart. 


Record  No, 

Number  of  Defects 
per  Record 

Record  No. 

Number  of  Defects 
per  Record 

1.  .    .              .... 

1 

16  

20 

2 

1 

17  

1 

3    

3 

18  

6 

4.  .  .       .           .... 

7 

19  

12 

5 

8 

20  

4 

6 

1 

21  

5 

7   .                    

2 

22  

1 

8 

6 

23  

8 

9 

1 

24  

7 

10   . 

1 

25  

9 

11 

10 

26  ,  

2 

12 

5 

27  

3 

13 

0 

28  

14 

14 

19 

29  

6 

15 

16 

30  

8 

16.12      (a)   Plot  the  OC  curve  for  Example  16.4. 
(6)    Determine  the  AQL  and  RQL  points. 
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16.13  (a)    Plot  the  AOQ  curve  for  Example  16.5. 
(6)    Determine  the  AOQL. 

16.14  Discuss  the  basic  purpose  of  acceptance  sampling  and  the  general 
methods  by  which  this  purpose  is  approached. 

16.15  Plot,  on  the  same  graph,  the  following  OC  curves: 
(a)    7^  =  500,  n  =  5Q,  a  =  0 

(6)    AT 
(c)    AT 

16.16  Plot,  on  the  same  graph,  the  following  OC  curves: 
(a)    AT  =  500,  ^  =  25,  a  =  l 

(6)    AT  =  500,  7i  =  50,  a  =  l 
(c)    N  =  500,  n  =  100,  a  =  1 

16.17  For  the  double  sampling  plan  specified  by  AT —  500,  ni  =  50,  n2  =  100, 
ai  =  5,  n  =  14,  and  a2  =  13,  determine  and  plot  the  OC,  AOQ,  and  ASN 
functions.  Find  the  value  of  the  AOQL. 

16.18  The  following  truncated  sequential  sampling  plan  is  proposed: 
(a)   If  the  first  item  is  defective,  reject  the  lot; 

(6)  If  the  first  item  is  nondefective  and  there  is  no  more  than  one 
defective  up  to  and  including  the  tenth  item,  accept  the  lot; 

(c)  If  the  first  item  is  nondefective  and  there  are  two  defectives 
prior  to  or  including  the  tenth  item,  reject  the  lot. 

Determine  the  OC  and  ASN  functions. 
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CH  APTE  R    17 

SOME  OTHER  TECHNIQUES  AND 
APPLICATIONS 

IN  THE  PRECEDING  CHAPTERS  those  techniques  most  frequently  em 
ployed  by  users  of  statistical  methods  have  been  presented  as  integral 
parts  of  a  complete  discipline.  In  this  chapter  a  few  special  techniques 
will  be  discussed  and  one  or  two  interesting  applications  of  probability 
and/or  statistics  illustrated.  It  is  hoped  that  these  items  will  prove 
useful  to  many  of  the  readers  and  they  they,  the  items,  will  stimulate 
some  of  you  to  search  out  new  and  exciting  applications  in  your  own 
area  of  specialization. 

17-1      SOME  PSEUDO  t  STATISTICS 

Many  times,  the  researcher  may  feel  that  the  time  and  effort  in 
volved  in  the  calculation  of  s,  the  sample  standard  deviation,  is  too 
great  for  the  benefit  derived  therefrom.  Thus,  it  is  not  surprising  that 
techniques  have  been  devised  which  use  R,  the  sample  range,  in  place 
of  s.  One  of  these  techniques  involves  the  use  of  a  pseudo  t  statistic 
defined  by 

T,  -  &  -  ti/R.  (17.1) 

If  we  are  willing  to  assume  random  sampling  from  a  normal  popula 
tion,  tests  of  hypotheses  concerning  ^  or  confidence  interval  estima 
tion  of  ju  may  be  performed  by  considering  the  sampling  distribution  of 
T-L  and  utilizing  the  values  recorded  in  Table  1  of  Appendix  17. 

Example  17.1 

Consider  again  the  data  and  the  hypothesis  of  Example  7.8.  Using 
Equation  (17.1),  we  obtain  r1  =  (1267  — 1260) /8  =0.875.  Since  this  cal 
culated  value  of  rx  exceeds  the  critical  value  (for  n  =  4),  namely, 
ri(.975)  =  -717,  the  hypothesis  that  /x  =  1260  is  rejected.  (NOTE:  This 
is  in  agreement  with  the  conclusion  reached  in  Example  7.8) 

Example  17.2 

Utilizing  the  same  data  as  in  the  preceding  example,  a  99  per  cent 
confidence  interval  for  p.  may  be  obtained  as  follows: 

L  =  X  -  TI(  995yR  «  1267  —  1.316(8)  =  1256.47 
U  *  T  +  r1(995yR  *  1267  +  1.316(8)  =  1277.53. 

An  even  simpler  statistic  which  may  be  used  as  a  substitute  for 
"Student's"  t  is 

[5003 
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(17.2) 

Critical  values  of  this  statistic  are  given  in  Table  3  of  Appendix  17. 
Because  of  the  similarity  (in  use)  of  this  statistic  to  the  one  discussed 
in  the  preceding  paragraph,  no  numerical  examples  will  be  presented. 
When  dealing  with  the  difference  between  two  sample  means,  a  third 
pseudo  t  statistic  may  be  utilized,  namely, 

rd  =  C^x  -  TsVCRx  +  £2).  (17.3) 

Critical  values  of  rd  are  tabulated,  for  samples  of  equal  size,  in  Table  2 
of  Appendix  17. 

Example  17.3 

Consider  the  data  and  hypothesis  of  Example  7.19.  Using  Equation 
(17.3),  we  obtain  rd=  (4  —  7)/(7+7)  =  —3/14=  —0.214.  Since  the  cal 
culated  value  to  rd  lies  between  —  Td(.96)  =  —.24:6  and  rd(^^^  =  .246,  we 
are  unable  to  reject  the  hypothesis  that  Mi  =M2.  (NOTE:  This  agrees  with 
the  conclusion  reached  in  Example  7.19,) 

17.2     A  PSEUDO  F  STATISTIC 

As  might  be  expected,  it  is  also  possible  to  utilize  the  range  in  place 
of  the  standard  deviation  to  provide  a  quick  substitute  for  the  familiar 
F  statistic.  The  suggested  statistic  is 

=  the  ratio  of  the  two  sample  ranges,  (17.4) 


and  critical  values  are  tabulated,  for  certain  selected  sample  sizes,  in 
Table  4  of  Appendix  17.  As  with  F,  the  critical  values  for  the  lower  end 
of  the  distribution  of  Ri/R*  may  be  found  by  interchanging  ni  and  n2 
(the  two  sample  sizes)  and  calculating  the  reciprocals  of  the  tabulated 
values.  Because  of  the  simplicity  of  this  test  and  its  similarity  to  those 
discussed  in  the  preceding  section,  no  numerical  examples  will  be  given. 

17.3      EVOLUTIONARY  OPERATION 

In  Section  13.6  the  reader  was  exposed  to,  but  not  indoctrinated  in, 
the  subject  known  as  "response  surface  techniques."  In  the  present 
section,  a  related  technique  wUl  be  introduced.  This  technique,  known 
as  evolutionary  operation  (or  EVOP),  is  an  application  of  the  concepts 
of  response  surface  methodology  to  the  problem  of  improving  the  per 
formance  of  industrial  processes.  Consequently,  this  technique  should 
be  of  great  interest  to  those  persons  concerned  with  production  proc- 


Q^CI 

Because  no  detailed  discussion  of  response  surface  methodology  was 
undertaken  in  Chapter  13,  a  full  description  of  EVOP  cannot  be  given 
here.  However,  in  capsule  form,  the  important  elements  of  the  tech 
nique  are: 

(1)   It  is  a  method  of  process  operation  which  includes  a  built-in 
procedure  for  improving  productivity. 


502  CHAPTER    17,    OTHER   TECHNIQUES    AND    APPLICATIONS 

(2)  It  uses  some  relatively  simple  statistical  concepts. 

(3)  It  is  run  during  normal  routine  production  by  plant  personnel. 

(4)  The  basic  philosophy  of  EVOP  is:  A  process  should  be  run 
not  only  to  produce  product  but  also  to  provide  information  on 
how  to  improve  the  process  and/or  product. 

(5)  Through  the  planned  introduction  of  minor  variants  into  the 
process,    the    customary    "static"    operating    conditions    are 
made  "dynamic"  in  nature. 

(6)  Utilizing  the  elementary  principles  of  response  surface  meth 
odology  and  making  small  changes  (in  a  prescribed  fashion) 
in  the  "controlled"  variables  of  the  process,  the  effects  of  the 
forced  changes  can  be  assessed. 

(7)  If  a  simple  pattern  of  operating  conditions  is  employed  (e.g., 
a  22  factorial  plus  a  center  point)1  and  if  the  operation  of  the 
process  under  each  of  these  conditions  is  termed  a  "cycle," 
the  running  of  several  cycles  will  yield  sufficient  information  to 
permit  a  judgment  to  be  made  as  to  what  is  a  better  nominal 
set  of  operating  conditions. 

(8)  Constant   repetition  of  this  program  will  lead  to   continual 
improvement  of  the  process. 

(9)  Two  important  items  in  an  EVOP  program  are : 

(a)  All  data  should  be  prominently  displayed  on  an  Infor 
mation  Board. 

(b)  An   Evolutionary   Operation   Committee    (composed   of   re 
search,   development,   and  production  personnel)    should 
make  periodic  reviews  of  the  EVOP  program  if  the  maxi 
mum  benefit  is  to  be  derived  from  this  approach. 

Inasmuch  as  the  preceding  remarks  are  only  a  skeletal  description 
of  the  EVOP  technique,  those  persons  desiring  more  details  must  con 
sult  other  sources.  In  particular,  the  publications  by  Box  (3)  and  Box 
and  Hunter  (4)  are  especially  recommended. 

17.4     TOLERANCES 

In  Section  5.14  some  remarks  were  made  with  regard  to  the  distri 
bution  of  a  linear  combination  of  random  variables.  At  this  time,  it  is 
appropriate  to  consider  a  specific  application  of  those  remarks,,  namely,, 
to  the  subject  of  tolerances. 

Before  embarking  on  this  discussion,  a  distinction  must  be  made 
between  specification  limits  (that  is,  a  nominal  value  plus  and  minus 
certain  engineering  tolerances)  and  natural,  or  statistical,  tolerances.  In 
general,  specification  limits  are  set  by  the  designer  as  a  statement  of 
his  requirements  with  respect  to  a  certain  dimension.  Thus,  in  many 
cases,  the  specification  limits  have  little  connection  with  production 
capabilities.  On  the  other  hand,  statistical  tolerance  limits  reflect  the 

1  This  center  point  will  usually  be  that  nominal  set  of  operating  conditions 
specified  in  the  engineering  drawings  or  production  manual. 
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capabilities  of  the  process  producing  the  dimensions  in  question.  (See 
Sections  6.11,  6.12,  and  6.13).  Therefore,  from  the  point  of  view  of 
managerial  decision  making,  the  subject  of  statistical  tolerances  is  of 
great  importance,  for  it  bears  directly  on  the  success  or  failure  of  the 
company's  product. 

In  what  follows,  we  are  interested  only  in  general  concepts  and  a 
method  of  approach  that  may  prove  helpful  in  the  design  of  complex 
equipment.  Accordingly,  our  attention  will  be  confined  to  those  cases 
in  which  the  several  variables  (dimensions)  are  normally  and  inde 
pendently  distributed  with  known  means  and  variances.  Restrictive 
as  these  assumptions  may  be,  they  will  not  seriously  limit  our  presen 
tation.  [NOTE  :  If  other  distributions  must  be  used,  the  same  concepts 
will  apply.  For  example,  see  Breipohl  (5).] 

Two  cases  will  be  examined:  (1)  linear  combinations  of  the  variables 
and  (2)  nonlinear  combinations  of  the  variables. 

Case  I:  Linear  Combinations  of  Independent  Random 
Variables 

This  case  was  covered  in  Section  5.14  where  it  was  noted  that  if 

U  =  Jb  a^-,  (17.5) 

1=1 

then 

n 

MCJ  =  ]C  #*Mi  (17.6) 

t=i 

and 

?<^  (17.7) 


where  M*  and  cr?  are  the  mean  and  variance,  respectively,  of 
Xi  (i=  17  -  •  •  ,  n).  If,  as  is  true  in  many  applications,  each  a»=  1,  then 
Equations  (17.6)  and  (17.7)  reduce  to 


i*  (17.8) 

«—  i 

and 

*.  (17.9) 


Because  the  above  results  are  used  in  a  variety  of  ways,  a  complete 
discussion  of  each  different  situation  is  not  planned.  However,  some 
typical  examples  will  be  presented  to  acquaint  the  reader  with  a  few  of 
the  possible  applications. 
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Example  17.4 

Consider  a  simple  addition  of  components  such  that  the  dimension  of 
the  assembly  (F)  is  the  sum  of  the  dimensions  of  the  individual  com 
ponents,  that  is,  F  =  53?.  i  Xi.  If  there  are  two  components,  Xi  and  X%, 
with  means  0.500  and  0.410  and  with  standard  deviations  0.008  and 
0.006,  respectively,  then  ^r  =  0.910  and  <rY  =  [(.008)2-K.006)2]1/2 
=  0.01.  If  Xi  and  X?  are  assumed  to  be  normally  distributed,  then  F 
is  normally  distributed  and  the  distribution  (of  F)  may  be  compared 
with  the  specification  limits  (for  F)  to  obtain  the  expected  percentage 
of  defective  assemblies. 

Example  17.5 

Consider  an  assembly  consisting  of  five  components  for  which 
F=  ]C*-i  Xi  represents  the  simple  addition  of  the  appropriate  dimen 
sions.  If  the  five  dimensions  are  normally  and  independently  dis 
tributed  with  means  0.500,  0.410,  0.200,  0.700,  and  0.210,  respectively, 
and  if  they  may  be  assumed  to  have  a  common  variance,  cr2,  how  large 
can  a  be  if  Mr  ±3cr:r  =  2-020±  0.030  units?  Using  Equation  (17.9),  it  may 
be  verified  that  cry  =  5cr2.  Therefore,  Scr^  =  3<r  V5  =  0.030,  which  means 
that  5a2  =  (0.01)2  =  0.0001,  and  thus  <r2  =  0.00002.  Consequently,  the 
maximum  allowable  value  of  cr  is  (0.00002)1/2  =  0.0045. 

Example  17.6 

Another  illustration  of  a  linear  combination  of  dimensions  is  the  clear 
ance  between  a  shaft  and  a  bearing.  Let  us  assume  that  shafts  are  mass 
produced  such  that  the  outside  diameters  are  normally  and  independ 
ently  distributed  with  jus  =  1.05  inches  and  with  standard  deviation  <rs. 
Let  us  also  assume  that  bearings  are  produced  such  that  the  inside 
diameters  are  normally  and  independently  distributed  with  jLtB  =  1.06 
inches  and  with  standard  deviation  crB  ==0.001  inch.  If  production  of  the 
shafts  is  to  be  controlled  so  that  no  more  than  5  per  cent  of  randomly 
mated  shafts  and  bearings  will  exhibit  interference,  what  is  the  maxi 
mum  allowable  value  of  crs? 

To  answer  this  question,  consider  F==JB  —  S.  Using  Equations  (17.6) 
and  (17.7),  it  may  be  verified  that  Atr  =  1.06  —  1.05=0.01  inch  and  that 
°>=  [(0.001)2  +  cr|]1/2  inch.  Now,  interference  will  occur  when  F<0. 
Thus,  it  is  appropriate  to  consider 

p{  y  <  0}  =  P{Z  <  (0  -  0.01)/<rF}  =  0.05 

where  Z==  (F— JUF)/O-F.  On  consulting  Appendix  3,  it  is  found  that 
—  0.01/oy=  —1.645  and,  consequently,  <r^  =  0.00000  l+<r|  =  (1/164.5)2 
=  0.000037.  As  a  result,  it  is  seen  that  the  maximum  <rs  is  0.006. 

It  is  hoped  that  the  preceding  examples  will  be  sufficient  to  indicate 
the  nature  and  scope  of  the  theory  of  statistical  tolerances  when  dealing 
with  linear  combinations  of  independent  random  variables.  For  those 
who  wish  to  investigate  the  subject  in  greater  depth,  more  details  and 
examples  are  available  in  Bowker  and  Lieberman  (2)  and  Breipohl  (5). 
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Case  II  :  Nonlinear  Combinations  of  Independent  Random 
Variables 

When  variables  are  combined  in  a  nonlinear  fashion,  the  tolerance 
problem  is  usually  much  more  difficult.  The  reason  for  this  difficulty  is 
that,  in  general,  it  is  not  easy  to  determine  the  distribution  of  a  non 
linear  combination  of  random  variables.  However,  if  a  linear  approxi 
mation  is  acceptable,  the  problem  may  be  handled  by  expanding  the 
function  in  a  Taylor  series  about  the  mean  values.  That  is,  if 


F  =  <K^i,  -  •  •  ,  -Yn),  (17.10) 

it  may  be  shown  that 


Y  = 

*—  i 


+  terms  of  higher  order.  (17.11) 

Then  if  the  "terms  of  higher  order"  are  neglected,  it  may  be  verified 
that 


•  •  •  ,M»)  (17.12) 

and 


r 


l,  '  '  -,Mn-I 


(17.13) 


where  /z*  and  cr\  are  the  mean  and  variance,  respectively,  of 
Xi  (i—  1,  •  •  •  ,  n).  As  a  consequence,  Equations  (17.12)  and  (17.  IS) 
may  be  used  to  obtain  approximate  tolerance  limits  for  Y, 

Example  17.7 

Suppose  that  Y  =  X^X^X^  Expanding  this  function  in  a  Taylor  series 
about  pi,  JJLZ,  and  ^3,  we  obtain 

Y  = 


If  the  X  i  are  normally  and  independently  distributed  with  mean  M 
variance  erf,  then  Y  is  approximately  normally  distributed  with  mean 
JAY  ss  /ziMuMs  and  variance  <r|r  =  (MSMS)  2cr?  +  (yuiMs)  2o|  +  (^1^2)  2oi-  These 
expressions  may  then  be  used  to  investigate  the  natural  tolerances  of  Y. 

Example  17.8 

Consider  an  electrical  circuit  consisting  of  two  resistors  connected  in 
parallel.  For  such  a  circuit,  Rc^RiRz/tRi+Rz)  where  Rt(i  =  I,  2) 
signify  the  resistances  of  the  two  resistors  connected  in  parallel  and  Re 
is  the  circuit  resistance.  Using  the  Taylor  series  approximation,  it  is 
easily  verified  that 


Ma)    and  <r*    «  | 

where  the  subscripts  conform  with  the  notation  used  above  for  the 
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R- values.  These  equations  may  then  be  used  to  study  the  relationships 
among  the  tolerances  of  the  resistors  and  the  tolerances  of  the  circuit, 

17.5     THE   ESTIMATION   OF  SYSTEM    RELIABILITY 

The  general  problem  of  estimating  the  reliability  of  systems  has 
received  considerable  attention  in  recent  years.  Stated  briefly,  the 
problem  is  as  follows:  Given  probabilities  for  the  successful  operation 
of  the  various  components  utilized  in  a  system,  estimate  the  probabil 
ity  that  the  system  will  operate  successfully.  Before  attempting  to 
present  the  solution  of  a  problem  such  as  posed  in  the  preceding  para 
graph,  it  will  be  wise  to  adopt  some  standard  notation.  Thus,  in  what 
follows,  we  shall  use: 

Pi  =  the  probability  that  the  ith  component  in  the  system  will  oper 
ate  successfully, 

gi  =  the  probability  that  the  ith  component  in  the  system  will  fail 
to  operate  successfully  =  1  —  p^ 

JP  =  the  probability  that  the  system  will  operate  successfully,  and 

Q  =  the  probability  that  the  system  will  fail  to  operate  successfully 
=  1-P. 

Given  this  notation  and  assuming  that  n  components  are  utilized  in  the 
system,  we  may  write 

P  =  f(pi,  •  •  •  ,  pj  (17.14) 

where  the  form  of  the  function  will  depend  on  the  nature  of  the  system- 
If  the  system  under  consideration  is  either  a  simple  series  system  (in 
which  all  components  must  work  for  the  system  to  succeed)  or  a  paral 
lel  sytem  (in  which  at  least  one  of  the  components  must  work  for  the 
system  to  succeed),  and  if  the  various  components  are  statistically 
independent  in  their  operation,  the  functional  form  is  easy  to  deter 
mine  and  the  basic  equations  are  as  follows: 
Series  System 


Q  =  1-  IIP*=  1-  11(1  -  5*)  (17.16) 

Parallel  System 


=  i  -  n  a  -  ^). 

i=i  i»=i 

If  the  system  involves  both  series  and  parallel  features,  an  equivalent 
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system  (or  circuit)  can  always  be  found  that  will  permit  utilization  of 
the  preceding  equations. 

Example  17.9 

Consider  a  system,  consisting  of  four  components  connected  in 
series,  in  which  all  components  must  operate  properly  if  the  system  is 
to  function  properly.  If  the  respective  failure  probabilities  of  the  four 
components  are  0.02,  0.03,  0.05,  and  0.02,  then  P  =  (0.98) (0.97) (0.95) 
(0.98)  and  Q  =  l— P. 

Example  17.10 

Consider  a  system,  consisting  of  four  components  connected  in 
parallel,  in  which  at  least  one  of  the  components  must  operate  properly 
if  the  system  is  to  function  properly.  If  the  respective  failure  proba 
bilities  of  the  four  components  are  0.05,  0.10,  0.02,  £nd  0.001,  then 
Q  =  (0.05)(0.10)(0.02)(0.001)  and  P  =  l— Q. 

Brief  as  the  preceding  treatment  has  been,  I  hope  that  it  has  been 
sufficient  to  acquaint  the  reader  with  the  nature  of  the  problem  and 
with  the  method  of  solution.  Some  of  the  problems  at  the  end  of  the 
chapter  will  require  the  extension  of  the  basic  principles  to  more  com 
plex  situations  while  others  will  introduce  certain  approximations  that 
are  often  used  by  reliability  analysts. 

Before  leaving  this  topic,  I  would  be  derelict  in  my  duty  if  I  did  not 
call  a  number  of  specific  points  to  your  attention.  These  are: 

(1)  The  assumption  of  statistical  independence  is  frequently  sub 
ject  to  question. 

(2)  Rarely  does  the  analyst  know  the  true  values  of  the  p*  and  q^ 
This  means  that  he  is  really  using  p*  and  fc=I  —  pi  and  thus 
p  =/(pi,  -  -  -  ,  pn)  and  Q  =  1  —  P  are  actually  point  estimates 
of  the  unknown  parameters,  P  and  Q. 

(3)  While  confidence  limits  for  the  true  pf  and  q*  are  easily  ob 
tained  (see  Chapter  6),  no  satisfactory  methods  have  yet  been 
developed    for    providing    confidence    limits    for    P    and    Q. 
(NOTE:  A  few  special  cases  have  been  solved  but  the  general 
case  is  still  under  investigation.) 

(4)  Frequently,  the  analyst  must  consider  more  than  just  success 
or  failure.  For  example,  in  nuclear  weaponry  the  possibility  of 
premature  operation  must  also  be  considered.  Thus,  for  each 
device,  and  for  the  system  as  a  whole,  we  now  have  three 
probabilities  to   contend  with.   Apart  from  the  added   com 
plexity  of  the  mathematics  involved,  this  can  lead  to  great 
difficulties  in  the  (logical)  analysis  of  the  system. 

(5)  In  many  applications,  the  probabilities  p*  and  g*  will  be  func 
tions  of  time,  that  is,  the  reliability  analyst  will  be  dealing 
with  pi(0  and  g^t)  where  t  represents  operating  time  or  age. 
When  such  is  the  case,  the  problems  of  analysis  are  compli 
cated  by  such  phenomena  as  early  failures  and  wearout. 
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The  preceding  are  just  a  few  of  the  points  that  plague  a  reliability  ana 
lyst  as  he  goes  about  his  daily  work.  I  am  certain  that  you  could,  with  a 
little  reflection,  add  many  other  items  to  the  above  list.  However,  such 
a  list  would  be  out  of  place  in  this  book.  The  items  mentioned,  though, 
are  distinctly  statistical  in  nature  and  thus  it  seems  appropriate  to 
call  them  to  your  attention  at  this  time. 

Problems 

17.1        Using  first  rt  and  then  T2  (as  defined  in  Section  17.1),   rework  the 
following  problems: 


(a)   6.1 

(/)    7.1 

(j)    7.6 

(6)    6.5 

(?)   7.2 

(fc)   7.29 

(c)    6.20 

/  7,  \      TO 
(A)      7.O 

(1)    7.32 

(<2)   6.22 

(t)    7.5 

(m)    7.33 

(e)    6.23 

17.2 

Using  rd 

(as  defined  in  Section 

17.1),  rework  the  following  problems: 

(a)   7.25 

(e)    7.36 

(6)   7.27 

C/)    7.37 

(c)    7.30 

(0)   7.39 

(d)  7.31 

(A)  7.40 

17.3 

TJsing  the   pseudo  F  statistic 

denned  in  Section   17.2,  rework  the 

following  problems: 

(a)   6.24  (d)  7.46 

(6)    6.26  (e)    7.47 

(<?)    7.45 

17.4  Rework  Example  17,5  assuming  that  /-tjr±  5crF  =  2.  020  ±0.030  units. 

17.5  Rework  Example  17.6  assuming  that  no  more  than  1  per  cent  of 
randomly  mated  bearings  and  shafts  must  exhibit  interference. 

17.6  Evaluate    the   results    of    Example    17.7   if  jui  =  40,    ^2  =  0.5,    ;u3  =  3, 
3cri=0.5,  3cr2  =  0.005,  and  3cr3  =  0.06. 

17.7  If  Y=*X  cos  6+UVW  sin  9,  where  X}   U,   V,  W,  and  9  are  random 
variables,  use  a  Taylor's  expansion  about  the  means  to  determine 
approximate  expressions  for  the  mean  and  variance  of  Y* 

17.8  Assuming  that  each  random  variable  is  normally  distributed  with  a 
mean  equal  to  the  nominal  value  and  with  a  standard  deviation  equal 
to  one-third  of  the   (one-sided)   engineering  tolerance,  evaluate  the 
results  determined  in  Problem  17.7,  for  the  following  specifications: 


?7=40±0.5 
F  =  0.5±l  per  cent 
W  =  3±2  per  cent 
0  =  60  ±0.25  degrees. 

17.9  Two  resistors  are  assembled  in  series.  Each  is  nominally  rated  at  20 
ohms.  The  resistors  are  known  to  be  normally  distributed  about  the 
nominal  value  with  a  standard  deviation  of  1.5  ohms.  What  is  the 
mean  and  standard  deviation  of  the  circuit  resistance? 

17.10  Rework  Problem  17.9  if  the  resistors  are  uniformly  distributed  over 
the  interval  18  to  22  ohms. 

17.11  If  two  mating  parts,  X  and  F,  are  each  normally  distributed  with 
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*r==2.04  inches  and  standard  deviations 
)2  inch,  what  is  the  probability  of  inter- 

17.12  With  reference  to  Problem  17.11,  what  should^  be  (assuming  every- 
thing  else  is  unchanged)  if  the  probability  o^intekrenTe  fs  to  be 

17.13  For  each  of  the  following  cases,  use   a  Taylor's  series  to  find  the 
approximate  mean  and  variance  of  the  dependent  variable - 

(a)    R      "^ 
(6)    AC 
(<0    R" 


(e)    C  = 

17.14  Fora  system  involving  four  components  (A,  B,  C,  and  JD)  for  which 
PJL  -0.9,  PJ8-0.8,  pc  =  0.9,  and  ^=0.9,  determine  P  under  the  as 
sumption  of  mutual  independence  and  given  that  A  and  B  are  con 
nected  in  parallel  (branch  I)  and  that  this  branch  is  then  connected 

17.15  Consider  an  electrical  circuit  consisting  of  two  subcircuits,  the  first 
of  which  involves  the  components  Xl}  X2,  Xz,  and  X4  in  parallel,  and 
the  second  of  which  involves  the  components  X5  and  Xe  in  parallel. 
11   the   two   subscircuits   are   connected  in   series   and   mutual  inde 
pendence  can  be  assumed,   determine   Q  if  gi=0.10,   g2  =  0.05    and 
3s  =  £4  =  £5  =  #6  =  0.02.  *  *     ' 

17.16  An  equipment  consists  of  three  components  (£>,  E,  and  F)  connected 
in  series.  If  the  reliabilities  of  the  three  components  are  0,92,  0.95, 
and  0.96,  what  is  the  reliability  of  the  equipment?  State  all  assump 
tions  made  in  achieving  your  answer. 

17.17  An  equipment  consists  of  three  components  (D,  E,  and  F)  connected 
in  parallel.  If  the  reliabilities  of  the  three  components  are  0.92,  0.95, 
and  0,96,  what  is  the  reliability  of  the  equipment?  State  all  asssumpl 
tions  made  in  achieving  your  answer. 

17.18  An  equipment  consists  of  100  parts,  of  which  20  parts  are  tubes 
connected  functionally  in  series  (branch  A).  This  branch  is  in  turn 
connected   in   series   to  two  parallel   branches   of    60    and   20   parts 
(branches  B  and  C).  The  parts  which  comprise  each  of  these  branches 
are  connected  functionally  in  series.  The  reliability  of  each  tube  is 
0.95,   while  the  geometric  mean  reliability  of  branch  B  is  0.93  and 
the  geometric  mean  reliability  of  branch  C  is  0.96.  Draw  a  simplified 
equipment  diagram  and  determine  the  reliability  of  the  equipment. 
State  all  assumptions  made  in  achieving  your  answer. 

17.19  When  reliability  is   a  function  of  time,   it  is   common  practice  to 
assume  the  validity  of  the  exponential  probability  density  function 
as  a  failure   distribution.  That  is,   it  is  common  to  use  /(£)  ==\e~x* 
as  the  basic  failure  distribution.  In  such  a  case,  X  is  known  as  the 
hazard  rate   (or,  in  loose  language,  the  failure  rate)   and  m  — 1/X  is 
called  the  mean  time  between  failure  (MTBF).  Under  the  preceding 
conditions,  the  reliability  of  a  component  to  time  t  is 


Utilizing  the  above  information,  and  assuming  that  the  ith  compo- 
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nent  has  a  constant  hazard  rate\»(£  =  l,  •  •  •  ,  ri),  express  Equations 
(17.15)  through  (17.18)  in  terms  of  the  A's  and  *. 

17.20  Simplify  the  answer  to  the  preceding  problem  on  the  assumption  that 
all  the  \i  are  equal. 

17.21  If  /(<)=Xe-x«;  A>0,  Z>0;  what  effect  does  doubling  the  value  of  X 
have  on: 

(a)   the  MTBF,  and  (6)  R(f). 

17.22  Assume  an  equipment  consists  of  two  components  for  which  the 
failure  rates  are  Ai  =  .001  and  \2~.002,  respectively.  Calculate  the 
equipment  reliability  for  2  =  100  for  the  following  two  cases:  (a)  series 
connection,  (6)  parallel  connection. 

17.23  Show  that,  if  n  =  4,  Equation  (17.16)  can  be  approximated  by  either  4g 
or  4q  —  6g2.  (NOTE :  The  reliability  analyst  makes  frequent  use  of  such 
approximations,) 

17.24  The  failure  rate  for  a  television  receiver  is  0.02  failures/hour, 
(a)   Calculate  the  mean  time  between  failures. 

(6)   What  is  the  probability  of  such  a  receiver  failing  in  the  first  four 
hours? 
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A  a  alpha 

B  /3  beta 
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A  8  delta 

E  e  epsilon 

Z  f  zeta 

H  77  eta 

©  (9  theta 

I  L  iota 
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kappa 

lambda 

mu. 
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0.01 
0.02 

0.990 
0.980 

0.03 

0.970 

0.04 

0.961 

0.999 

0.05 

0.951 

0.999 

0.06 

0.942 

0.998 

0.07 

0.932 

0.998 

0.08 

0.923 

0.997 

0.09 

0.914 

0.996 

0.10 

0.905 

0.995 

0.15 

0.861 

0.990 

0.999 

0.20 

0.819 

0.982 

0.999 

0.25 

0.779 

0.974 

0.998 

0.30 

0.741 

0.963 

0.996 

0.35 

0.705 

0.951 

0.994 

0.40 

0.670 

0.938 

0.992 

0.999 

0,45 

0.638 

0.925 

0.989 

0.999 

0.50 

0.607 

0.910 

0.986 

0.998 

0.55 

0.577 

0.894 

0.982 

0.998 

0.60 
0.65 

0.549 
0.522 

0.878 
0.861 

0.977 
0.972 

0.997 
0.996       0.999 

1  Reprinted  from  E.    C.   Molina,   Poisson's  Exponential  Binomial  Limit,    D. 
Van  Nostrand  Company,  Inc.,  New  York,   1947.  By  permission  of  the  author 
and  publishers. 

2  Entries  in  the  table  are  values  of  F(x)  where 

/?(#)  »  p(c  <  x)  —  £  e~^c/c\ 

c-O 

*  Blank  spaces  to  the  right  of  the  last  entry  in  any  row  of  the  table  may  be 
read  as  1;  blank  spaces  to  the  left  of  the  first  entry  in  any  row  of  the  table  may 
be  read  as  0. 
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0.70 
0.75 

0.80 
0.85 
0.90 
0.95 
1.00 

1.1 
1.2 
1.3 
1.4 

1.5 

0.497 
0.472 

0.449 
0.427 
0.407 
0.387 
0.368 

0.333 
0.301 
0.273 
0.247 
0.223 

0.844 
0.827 

0.809 
0.791 
0.772 
0.754 
0.736 

0.699 
0.663 
0.627 
0.592 
0.558 

0.966 
0.959 

0.953 
0.945 
0.937 
0.929 
0.920 

0.900 
0.879 
0.857 
0.833 
0.809 

0.994 
0.993 

0.991 
0.989 
0.987 
0.9S4 
0.981 

0.974 
0.966 
0.957 
0.946 
0,934 

0.999 
0.999 

0.999 
0.998 
0.998 
0.997 
0.996 

0.995 
0.992 
0.989 
0.986 
0.981 

0.999 

0.999 
0.998 
0.998 
0.997 
0.996 

0.999 
0.999 

1.6 
1.7 
1.8 
1.9 
2.0 

0.202 
0.183 
0.165 
0.150 
0.135 

0.525 
0.493 
0.463 
0.434 
0.406 

0.783 
0.757 
0.731 
0.704 
0.677 

0.921 
0.907 
0.891 
0.875 
0.857 

0.976 
0.970 
0.964 
0.956 
0.947 

0.994 
0.992 
0.990 
0.987 
0.983 

0.999 
0.998 
0.997 
0.997 
0.995 

0.999 
0.999 
0.999 

2.1 
2.2 
2.3 
2.4 
2.5 
2.6 
2.7 
2.8 
2.9 
3.0 

0.122 
0.111 
0.100 
0.091 
0.082 
0.074 
0.067 
0,061 
0.055 
0.050 

0.380 
0.355 
0.331 
0.308 
0.287 
0.267 
0.249 
0.231 
0.215 
0.199 

0.650 
0.623 
0.596 
0,570 
0.544 
0.518 
0,494 
0.469 
0.446 
0.423 

0.839 
0.819 
0.799 
0.779 
0.758 
0.736 
0.714 
0.692 
0.670 
0.647 

0.938 
0.928 
0.916 
0.904 
0.891 
0.877 
0.863 
0.848 
0.832 
0.815 

0.980 
0.975 
0.970 
0.964 
0.958 
0.951 
0.943 
0.935 
0.926 
0.916 

0.994 
0.993 
0.991 
0.988 
0.986 
0.983 
0.979 
0.976 
0.971 
0.966 

0.999 
0.998 
0.997 
0.997 
0.996 
0.995 
0.993 
0.992 
0.990 
0.988 

0.999 
0.999 
0.999 
0.999 
0.998 
0.998 
0.997 
0.996 

0.999 
0.999 
0.999 
0.999 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

3.2 
3.4 
3.6 
3.8 
4.0 

0.041 
0.033 
0.027 
0.022 
0.018 

0.171 
0.147 
0.126 
0,107 
0.092 

0.380 
0.340 
0.303 
0.269 
0  238 

0.603 
0.558 
0.515 
0.473 
0.433 

0.781 
0.744 
0.706 
0.668 
0.629 

0.895 
0.871 
0.844 
0.816 
0.785 

0.955 
0.942 
0.927 
0.909 
0.889 

0.983 
0.977 
0.969 
0.960 
0.949 

0.994 
0.992 
0.988 
0.984 
0.979 

0.998 
0.997 
0.996 
0.994 
0.992 

0.999 
0.999 
0.998 
0.997 

4.2 

0.015 

0.078 

0.210 

0.395 

0.590 

0.753 

0,867 

0.936 

0.972 

0.989 

0.996 

4.4 

0.012 

0.066 

0.185 

0.359 

0.551 

0.720 

0.844 

0.921 

0.964 

0.985 

0.994 

4.6 

0.010 

0.056 

0.163 

0.326 

0.513 

0.686 

0.818 

0.905 

0.955 

0.980 

0.992 

4.8 

0.008 

0.048 

0.143 

0.294 

0.476 

0.651 

0.791 

0.887 

0.944 

0.975 

0.990 

5,0 

0.007 

0.040 

0.125 

0.-265 

0.440 

0.616 

0.762 

0.867 

0.932 

0.968 

0.986 

5.2 

0.006 

0.034 

0.109 

0.238 

0.406 

0.581 

0.732 

0.845 

0.918 

0.960 

0.982 

5.4 

0.005 

0.029 

0.095 

0.213 

0.373 

0.546 

0.702 

0.822 

0.903 

0.951 

0.977 

5.6 

0.004 

0.024 

0.082 

0.191 

0.342 

0.512 

0.670 

0.797 

0,886 

0.941 

0.972 

5.8 

0.003 

0.021 

0.072 

0.170 

0.313 

0.478 

0.638 

0.771 

0.867 

0.929 

0,965 

6.0 

0.002 

0.017 

0.062 

0,151 

0.285 

0.446 

0.606 

0.744 

0.847 

0.916 

0.957 

X 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

3.2 
3.4 

3.6 

3.8 

0.999 

4.0 

0.999 

4.2 

0.999 

4.4 

0.998 

0.999 

4.6 

0.997 

0.999 

4.8 

0.996 

0.999 

5.0 

0.995 

0.998 

0.999 

5.2 

0.993 

0.997 

0.999 

5.4 

0.990 

0.996 

0.999 

5.6 

0.988 

0.995 

0.998 

0.999 

5.8 

0.984 

0.993 

0.997 

0.999 

6.0 

0.98O 

0.991 

0.996 

0.999 

0.999 

[513] 


10 


6.2 
6.4 
6.6 
6.8 
7.0 

0.002 
0.002 
0.001 
0.001 
0.001 

0.015 
0.012 
0.010 
0.009 
0.007 

0.054 
0.046 
0.040 
O.034 
0.030 

0.134 
0.119 
0.105 
0.093 
0.082 

0.259 
0.235 
0.213 
0.192 
0.173 

0.414 
0.384 
0.355 
0.327 
0.301 

0.574 
0.542 
0.511 
0.480 
0.450 

0.716 
0.687 
0.658 
0.628 
0.599 

0,826 
0.803 
0.780 
0.755 
0.729 

0.902 
0.886 
0.869 
0.850 
0.830 

0.949 
0.939 
0.927 
0.915 
0.901 

7.2 
7.4 
7.6 
7.8 
8.0 

0.001 
0.001 
0.001 

0.006 
0.005 
0.004 
0.004 
0.003 

0.025 
0.022 
0.019 
0.016 
0.014 

0.072 
0.063 
0.055 
O.048 
0.042 

0.156 
0.140 
0.125 
0.112 
0.100 

0.276 
0.253 
0.231 
0.210 
0.191 

0.420 
0.392 
0.365 
0.338 
0.313 

0.569 
0.539 
0.510 
0.481 
0.453 

0.703 
0.676 
0.648 
0.620 
0.593 

0.810 
0.788 
0.765 
0.741 
0.717 

0.887 
0.871 
0.854 
0.835 
0.816 

8.2 
8.4 

0.003 
0.002 

O.012 
0.010 

0.037 
0.032 

0,089 
0.079 

0.174 
0.157 

0.290 
0.267 

0.425 
0.399 

0.565 
0.537 

0.692 
0.666 

0.796 
0.774 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

6.2 
6.4 
6.6 
6.8 
7.0 

0.975 
0.969 
0.963 
0.955 
0.947 

0.989 
0.986 
0.982 
0.978 
0.973 

0.995 
0.994 
0.992 
0.990 
0.987 

0.998 
0.997 
0.997 
0.996 
0.994 

0.999 
0.999 
0.999 
0.998 
0.998 

0,999 
0.999 
0.999 

7.2 
7.4 
7.6 
7,8 
8.0 

0.937 
0.926 
0.915 
0.902 
0.888 

0.967 
0.961 
0.954 
0.945 
0.936 

0.984 
0.980 
0.976 
0.971 
0.966 

0.993 
0.991 
0.989 
0.986 
0.983 

0.997 
0.996 
0.995 
0.993 
0.992 

0.999 
0.998 
0.998 
0.997 
0.996 

0.999 
0.999 
0.999 
0,999 
0.998 

0,999 

8.2 
8.4 

0.873 
0.857 

0.926 
0,915 

0,960 
0,952 

0.979 
0.975 

0.990 
0.987 

0.995 
0.994 

0.998 
0.997 

0.999 
0.999 
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ON  ON  ON  ON  ON  ON 

CO  <•—  <  CD  vO  CM  CM  CO 

ON  CM  ^  XO  X—  •  ON  ^  —  1 

X—  J>-  ^O  XO  "Sf  CO  CO 

CD  CD  CD  CD  CD  CD 
ON  ON  !"••—•  ^^*  CO  ON  vo 

CD  CD  CD  CD  CD  CD  CD 

CO 

ON  ON  ON  ON  ON  ON  ON 

X-—  ^O  xo  ^  CO  CO  CM 

CO 

CD  CD  CD  CD  CD  CD  CD 

ON  x*-—  ^H  ON  T—  i  ON  CD 

ON  ON  ON  co  oo  vo  to 

£*^>  t  '-"  j  f  —  i  r~-t  r^^t  r*~-i  r-  -i 

co 

X^-.  ON  i-H  X"—  CD  CO  xo 
^  XO  X—  CO  •»—  f  -ctH.  CO 

VO  xo  -«t<  CO  CO  CM  i—  t 

O3 

CO 

CD  CD  CD  CD  CD  CD  CD 

CO  xo  ^H  co  I-H  co  ON 
ON  ON  ON  co  x  —  to  01 
ON  ON  ON  ON  ON  ON  ON 

CD  CD  CD  CD  CD  CD  CD 

*Tj  CD  ^  vo  00  CD  -=fr< 
vO  x^-  OO  CD  co  CO  co 

CO 

O  CD  CD  CD  CD  CD  CD 
VO  CM  xo  CO  VO  CM  CD 

ON  ON  co  X*—  xo  co  ^p 
ON  ON  ON  ON  ON  ON  ON 

CD  CD  CD  CD  CD  CD  CD 

ON  1—4  CM  CM  xo  CO  CM 
vo  QO  CD  co  x*-»  fM  ON 

S3 

CD  CD  CD  CD  CD  CD  CD 

ON  CO  X-—  xo  co  CD  vo 
ON  ON  ON  ON  ON  ON  oo 

•^h1  co  co  CM  i—  t  I-H  CD 

CD  CD  CD  CD  CD  CD  CD 

CD  CD  CD  CD  CD  CD  CD 

CD  CD  CD  CD  CD  CD  CD 

CD  CD  CD  CD  CD  CD  CD 

ON  CD  *-4  03  co  •***  xo 

ON  CD  T-H  CM  CO  ^^  xo 
i—4  CM  CM  CM  O3  CM  CM 

i-H  CM  CM  CM  CM  CM  O3 
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APPENDIX     31 

CUMULATIVE  STANDARD 
NORMAL  DISTRIBUTION2 


z 

G(& 

z 

= 

GOO 

= 
z 

G(z) 

—  4.  CO 
—  3.99 
—  3.98 
—  3.97 
—  3.96 

O.OOOO3 
O.OOOO3 
O.O0003 
O.  00004 
0.00004 

—  3.60 
—  3.59 
—3.58 
—  3.57 
—  3.56 

O.OO016 
O.OOO17 
O.  00017 
O.OOO18 
0.  00019 

—  3.  2O 
—  3.19 
—  3.18 
—  3.17 
—  3.16 

O.OOO69 
O.OOO71 
0.00074 
0.00076 
O.OOO79 

—  3.95 
—  3.94 
—  3.93 
—  3.92 
—  3.91 

O.OOOO4 
0.00004 
O.OOOO4 
O.OOOO4 
O.O0005 

—3.55 
—  3.54 
—  3.53 
—  3.52 
—3.51 

O.OO019 
O.OO02O 
O.OO021 
0.00022 
O.OOO22 

—  3.15 
—  3.14 
—  3.13 
—  3.12 
—3.11 

O.OOO82 
O.OOO84 
O.OOO87 
O.OOO9O 
0.00094 

—  3.90 
—  3.89 
—  3.88 
—  3.87 
—  3.86 

O.OOOO5 
0.00005 
O.OOOO5 
0.00005 
O.OO006 

—  3.5O 

—  3.49 
—  3.48 
—  3.47 
—  3.46 

O.OO023 
0.00024 
O.OOO25 
O.OO026 
0.00027 

—3.10 
—3.09 
—3.08 
—  3.07 
—  3.O6 

0.00097 
O.OO10O 
O.OO1O4 
O.OO107 
0.00111 

—  3.85 
—  3.84 
—  3.83 
—  3.82 
—  3.81 

0.00006 
O.OOOO6 
0.00006 
O.O0007 
0,00007 

—  3.45 
—  3.44 
—  3.43 
—  3.42 
—  3.41 

O.OO028 
O.OO029 
O.OOO3O 
0.00031 
O.OO032 

—  3.O5 
—3.04 
—  3.O3 
—  3.  02 
—  3.O1 

O.OO114 
0.00118 
O.  00122 
0.00126 
0.00131 

—  3.80 

O.OO007 

—  3.40 

O.OOO34 

—  3.OO 

O.  00135 

—  3.79 

O.OO008 

—  3.39 

0.00035 

—  2.99 

O.  00139 

—  3.78 

0.00008 

—  3.38 

O.OO036 

—  2.98 

O.OO144 

—  3.77 

O.OOOO8 

—  3.37 

O.OOO38 

—  2.97 

O.  00  149 

—  3.76 

O.OOOO8 

—  3.36 

0.00039 

—  2.96 

0.00154 

—  3.75 

O.OOOO9 

—  3.35 

O.OO04O 

—2.95 

O.OO159 

—  3.74 

O.OOOO9 

—  3.34 

O.OO042 

—  2.94 

O.OO164 

—  3.73 

O.OO010 

—  3.33 

O.OO043 

—  2.93 

O.OO169 

—  3.72 

O.OO01O 

—  3.32 

0.00045 

—  2.92 

O.OO175 

—  3.71 

0.00010 

—  3.31 

O.OOO47 

—  2.91 

0.00181 

—  3.7O 

O.OO011 

—  3.30 

0.00048 

—  2.9O 

O.OO187 

—  3.69 

0,00011 

—  3.29 

O.OO050 

—  2.89 

O.OO193 

—  3.68 

O.OO012 

—  3.28 

O.OO052 

—  2.88 

O.OO199 

—3,67 

O.OO012 

—  3.27 

O.OOO54 

—  2.87 

O.OO205 

—  3.66 

O.O0013 

—  3.26 

O.OO056 

—  2.86 

O.OO212 

—  3.65 

O.O0013 

—3.25 

O.OO058 

—  2.85 

O.OO219 

—  3.64 

O.OOO14 

—  3.24 

O.OOO6O 

—  2.84 

O.OO226 

—  3.63 

O.OO014 

—  3.23 

O.OO062 

—  2.83 

O.  00233 

—  3.62 

O.OO015 

—  3.22 

O.OOO64 

—  2.82 

O.OO24O 

—  3.61 

0,QP015 

—  3.21 

O.OO066 

—  2.81 

0.00248 

1  Abridged  from  Karl  Pearson,  Tables  for  Statisticians  and  Biometricians, 
Part  I,  Cambridge  University  Press,  London,  1924,  pp.  2-6.  By  permission  of 
the  author  and  publishers. 


GO) 


f  * 

J  —« 


z. 


(Gz) 


—  2.80 

0.00256 

—  2,30 

0.01072 

—  1.80 

O.O3593 

—  2.79 

0.00264 

—  2.29 

0.01101 

—  1.79 

0.03673 

—  2.78 

0.00272 

—  2.28 

0.01130 

—  1.78 

0.03754 

—  2.77 

0.00280 

—  2.27 

0.01160 

—  1.77 

0.03836 

—  2.76 

0  .  00289 

—  2.26 

0.01191 

—  1.76 

0  .  O3920 

—  2.75 

0.00298 

—  2.25 

0.01222 

—  1.75 

0.04006 

—  2.74 

0.00307 

—  2.24 

0.01255 

—  1.74 

O.O4093 

—  2.73 

0.00317 

—  2.23 

0.01287 

—  1.73 

O.O4182 

—  2.72 

0.00326 

—  2.22 

0.01321 

-1.72 

0.04272 

—  2.71 

0.00336 

—  2.21 

0.01355 

—  1.71 

0.04363 

—  2.70 

0.00347 

—  2.20 

0.01390 

—  1.70 

0.04457 

—  2.69 

0.00357 

—  2.19 

0.01426 

—  1.69 

0.04551 

—  2.68 

0.00368 

—  2.18 

0.01463 

—  1.68 

O.O4648 

—  2.67 

0.00379 

—  2.17 

0.01500 

—  1.67 

O.O4746 

—  2.66 

0.00391 

—  2.16 

0.01539 

—  1.66 

0  .  04846 

—  2.65 

0.00402 

—  2.15 

0.01578 

—  1.65 

0.04947 

—  2.64 

0.00415 

—  2.14 

0.01618 

—  1.64 

0.05050 

—  2.63 

0.00427 

—  2.13 

0.01659 

—  1.63 

0.05155 

—  2.62 

0.00440 

—  2.12 

0.01700 

—  1.62 

0.05262 

—  2.61 

0.00453 

—  2.11 

0.01743 

—  1.61 

0.05370 

—  2.60 

0.00466 

—  2.10 

0.01786 

—  1.60 

0.05480 

—  2.59 

0.00480 

—  2.09 

0.01831 

—  1.59 

0.05592 

—  2.58 

0.00494 

—  2.08 

0.01876 

—  1.58 

0.05705 

—  2.57 

0.00508 

—  2.07 

O.  01923 

—  1.57 

0.05821 

—  2.56 

0.00523 

—  2.06 

0.01970 

—  1.56 

0.05938 

—  2.55 

0.00539 

—  2.05 

0.02018 

—  1.55 

0.06057 

—  2.54 

0  .  00554 

—  2.04 

0.02068 

—  1.54 

0.06178 

—  2.53 

0.00570 

—  2.03 

0.02118 

—  1.53 

0.06301 

—  2.52 

0.00587 

—  2.02 

0.02169 

—  1.52 

0  .  06426 

—  2.51 

0.00604 

—  2.01 

0.02222 

—  1.51 

0.06552 

—  2.5O 

0.00621 

—  2.00 

0.02275 

—  1.50 

0.06681 

—  2.49 

0.00639 

—  1.99 

0.02330 

—  1.49 

0.06811 

—  2.48 

O.  0065  7 

—  1.98 

O.  02385 

—  1.48 

0.06944 

—  2.47 

O.  00676 

—  1.97 

O.  02442 

—  1.47 

0.07078 

—  2.46 

0.00695 

—  1.96 

0.02500 

—  1.46 

0.07215 

-2.45 

0.00714 

—  1.95 

0.02559 

—  1.45 

0.07353 

—  2.44 

0.00734 

—  1.94 

O.  02619 

—  1.44 

0.07493 

—  2.43 

0.00755 

-1.93 

0.02680 

—  1.43 

0.07636 

—  2.42 

0.00776 

—  1.92 

O.  02  743 

—  1.42 

0.07780 

—  2.41 

0.00798 

—  1.91 

O.  02807 

—  1.41 

0.07927 

—  2.40 

0  .  00820 

—  1.90 

0.02872 

—  1.40 

0.08076 

—  2.39 

0.00842 

—  1.89 

0.02938 

—  1.39 

0.08226 

—  2.38 

0,00866 

-1.88 

O  .  03005 

—  1.38 

0.08379 

—  2.37 

0,00889 

—  1.87 

0.03074 

—  1.37 

0.08534 

—  2.36 

0.00914 

—  1.86 

,0.03144 

"~1*3^ 

0.08691 

—  2.35 

0  .  00939 

-1.85 

0.03216 

—  1.35 

0.08851 

—  2.34 

0.00964 

—  1.84 

0,03288 

—  1.34 

0.09012 

—  2.33 

0.00990 

—  1.83 

O.  03362 

—  1.33 

0.09176 

—  2.32 

O.01O17 

—  1.82 

O  .  03438 

—  1.32 

0.09342 

—  2.31 

O.  01044 

—  1.81 

0.03515 

—  1.31 

0.09510 
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GOO 


GO) 


GGs) 


—  1.30 
—  1.29 
—  1.28 
-1.27 
-1.26 

0.09680 
0.09853 
0.10027 
0.10204 
0.10383 

-0.85 
-0.84 
—  0.83 
-0.82 
—  0.81 

0.19766 
0.20045 
0.2O327 
0.20611 
0.20897 

—0.40 
—  0.39 
-0.38 
-0.37 
-0.36 

0.34458 
0.34827 
0.35197 
0.35569 
0.35942 

-1.25 

0.10565 

—  0.80 

0.21186 

—  0.35 

0.36317 

—  1.24 

0.10749 

-0.79 

0.21476 

-0.34 

0.36693 

—  1.23 

0.10935 

—  0.78 

0.21770 

—  0.33 

0.37070 

—  1.22 

0.11123 

—0.77 

0.22065 

-0.32 

0.37448 

—  1.21 

0.11314 

-0.76 

0.22363 

—  0.31 

0.37828 

-1.20 

0.11507 

-0.75 

0.22663 

-0.30 

0.38209 

—  1.19 

0.11702 

-0.74 

0.22965 

-0.29 

0.38591 

—  1.18 

0.11900 

-0,73 

0.23270 

-0.28 

0.38974 

—  1.17 

0.12100 

-0.72 

0.23576 

—0.27 

0.39358 

—  1.16 

0.12302 

-0.71 

0.23885 

-0.26 

0.39743 

-1.15 

0.12507 

-0.70 

0.24196 

-0.25 

0.40129 

—  1.14 

0.12714 

-0.69 

0.24510 

-0.24 

0.40517 

—  1.13 

0.12924 

-0.68 

0.24825 

-0.23 

0.40905 

—  1.12 

0.13136 

-0.67 

0.25143 

—0.22 

0.41294 

-1.11 

0.13350 

-0.66 

0.25463 

-0.21 

0.41683 

—  1.10 

0.13567 

-0.65 

0.25785 

-0.20 

0.42074 

-1.09 

0.13786 

-0.64 

0.26109 

—  0.19 

0.42465 

-1.08 

0.14007 

-0.63 

0.26435 

—0.18 

0.42858 

-1.07 

0.14231 

-0.62 

0.26763 

-0.17 

0.43251 

—  1.06 

0.14457 

—  0.61 

0.27093 

—  0.16 

0.43644 

-1.05 

0  .  14686 

—  0.60 

0.27425 

-0.15 

0.44038 

—  1.04 

0.14917 

-0.59 

0.27760 

-0.14 

0.44433 

-1.03 

0.15150 

-0.58 

0.28096 

—0.13 

0.44828 

—  1.02 

0.15386 

-0.57 

0.28434 

—  0.12 

0.45224 

-1.01 

0.15625 

—  0.56 

0.28774 

-0.11 

0.45620 

-1.00 

0.15866 

-0.55 

0.29116 

-0.10 

0.46017 

—  0.99 

0.16109 

-0.54 

0.29460 

-0.09 

0.46414 

-0.98 

0.16354 

-0.53 

0.29806 

-0.08 

0.46812 

—  0.97 

0.16602 

-0.52 

0.30153 

-0.07 

0.47210 

-0.96 

0.16853 

-0.51 

0.30503 

-0.06 

0.47608 

-0.95 

0.17106 

-0.50 

0.30854 

-0.05 

0.48006 

-0.94 

0.17361 

-0.49 

0.31207 

-0.04 

0.48405 

-0.93 

0.17619 

-0.48 

0.31561 

-0.03 

0.48803 

—  0.92 

0.17879 

-0.47 

0.31918 

—  0.02 

0.49202 

—  0.91 

0.18141 

-0.46 

0.32276 

—0.01 

0.49601 

—  0.90 

0.18406 

-0.45 

0.32636 

0.00 

0.50000 

-0.89 

0.18673 

-0.44 

0.32997 

0.01 

0.50399 

-0.88 

0.18943 

-0.43 

0.33360 

0.02 

0.50798 

-0.87 

0.19215 

-0.42 

0.33724 

0.03 

0.51197 

-0.86 

0.19489 

—  0.41 

0.34090 

0.04 

0.51595 
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CO) 


0,05 

0.51994 

0.50 

0.69146 

0.95 

0.82894 

0,06 

0.52392 

0.51 

0.69497 

0.96 

0.83147 

0.07 

0.52790 

0.52 

0.69847 

0.97 

0.83398 

0.08 

0.53188 

0.53 

0.70194 

0.98 

0  .  83646 

0.09 

0.53586 

0.54 

0  .  70540 

0,99 

0.83891 

0.10 

0.53983 

0.55 

0.70884 

1.00 

0.84134 

0.11 

0.54380 

0.56 

0.71226 

1.01 

0.84375 

O.12 

0.54776 

0.57 

0.71566 

1.02 

0.84614 

0.13 

0.55172 

0.58 

0.71904 

1.03 

0.84850 

0.14 

0.55567 

0.59 

0.72240 

1.04 

0.85083 

O.15 

0.55962 

0.60 

0.72575 

1.05 

0.85314 

0,16 

0.56356 

0.61 

0.72907 

1.06 

0.85543 

0.17 

0.56749 

0.62 

0.73237 

1.07 

0.85769 

0.18 

O.  57142 

0.63 

0.73565 

1.08 

0.85993 

0.19 

0.57535 

0.64 

0.73891 

1.09 

0.86214 

0.20 

0.57926 

0.65 

0.74215 

1.10 

0.86433 

0.21 

0.58317 

0.66 

0.74537 

1.11 

0.8665O 

0.22 

0.58706 

0.67 

0.74857 

1.12 

0.86864 

0.23 

0.59095 

0.68 

0.75175 

1.13 

0.87076 

0.24 

0.59483 

0.69 

0.75490 

1.14 

0.87286 

0.25. 

0.59871 

0.70 

0.75804 

1.15 

0.87493 

0.26 

0.60257 

0.71 

0.76115 

1.16 

0.87698 

0.27 

0.60642 

0.72 

0.76424 

1.17 

O.  87900 

0.28 

0.61026 

0.73 

0.76730 

1.18 

0.88100 

0.29 

0.61409 

0.74 

0.77035 

1.19 

0.88298 

0.30 

0.61791 

0.75 

0.77337 

1.20 

0.88493 

0.31 

0.62172 

0.76 

0.77637 

1.21 

0.88686 

0.32 

0.62552 

0.77 

0.77935 

1.22 

0.88877 

0.33 

0.62930 

0.78 

0.78230 

1.23 

0.89065 

0.34 

0.63307 

0.79 

0.78524 

1.24 

0.89251 

0.35 

0.63683 

0.80 

0.78814 

1.25 

0.89435 

0.36 

0.64058 

0.81 

0.79103 

1,26 

0.89617 

O.37 

0.64431 

0.82 

0.79389 

1.27 

0.89796 

O.38 

0.64803 

0.83 

0.79673 

1.28 

0.89973 

0.39 

0.65173 

0,84 

0.79955 

1.29 

0,90147 

0.40 

0.65542 

0.85 

0.80234 

1.30 

0,90320 

O.41 

0.65910 

0.86 

O.  80511 

1.31 

0.90490 

0.42 

0.66276 

0.87 

0.80785 

1.32 

0.90658 

O.43 

O.  66640 

0.88 

0.81057 

1.33 

0.90824 

0.44 

0.67003 

0.89 

0,81327 

1.34 

0.90988 

0.45 

0.67364 

0.90 

0.81594 

1.35 

0.91149 

0.46 

0.67724 

0.91 

0,81859 

1.36 

0.91309 

0.47 

0.68082 

0.92 

0.82121 

1.37 

0.91466 

0.48 

0.68439 

0.93 

0.82381 

1.38 

0.91621 

0.49 

0.68793 

0.94 

0,82639 

1.39 

0,91774 

15201 


1.40 

0,91924 

1.85 

0.96784 

2.30 

0,98928 

1.41 

0.92073 

1.86 

0.96856 

2.31 

0.98956 

1.42 

0.92220 

1.87 

0.96926 

2.32 

0.98983 

1.43 

0.92364 

1.88 

0.96995 

2.33 

0.99010 

1.44 

0.92507 

1.89 

0.97062 

2.34 

0.99036 

1.45 

0.92647 

1.90 

0.97128 

2.35 

0.99061 

1.46 

0.92785 

1.91 

0.97193 

2.36 

0.99086 

1.47 

0.92922 

1.92 

0.97257 

2.37 

0.99111 

1.48 

0.93056 

1.93 

0.97320 

2.38 

O.  99134 

1.49 

0.93189 

1.94 

0.97381 

2.39 

0.99158 

1.50 

0.93319 

1.95 

0.97441 

2.40 

0.99180 

1.51 

0.93448 

1.96 

0.97500 

2.41 

0.99202 

1.52 

0.93574 

1.97 

0.97558 

2.42 

O.  99224 

1.53 

0.93699 

1.98 

0.97615 

2.43 

0.99245 

1.54 

0.93822 

1.99 

0.97670 

2.44 

0.99266 

1.55 

0.93943 

2.0O 

0.97725 

2.45 

O.  99286 

1.56 

0.94062 

2.01 

0.97778 

2.46 

0.99305 

1.57 

0.94179 

2.02 

0.97831 

2.47 

0.99324 

1.58 

0.94295 

2.03 

0.97882 

2.48 

0.99343 

1.59 

0.94408 

2.04 

0.97932 

2.49 

0.99361 

1.60 

0.94520 

2.05 

0.97982 

2.50 

0.99379 

1.61 

0.94630 

2.06 

0,98030 

2.51 

O.  99396 

1.62 

0.94738 

2.07 

0.98077 

2,52 

0.  99413 

1.63 

0.94845 

2.08 

0.98124 

2.53 

0.99430 

1.64 

0.94950 

2.09 

0.98169 

2.54 

0.99446 

1.65 

0.95053 

2.10 

0.98214 

2.55 

0.99461 

1.66 

0.95154 

2.11 

0.98257 

2.56 

0.99477 

1.67 

0.95254 

2.12 

0.98300 

2.57 

0.99492 

1.68 

0.95352 

2.13 

0.98341 

2.58 

0.99506 

1.69 

0.95449 

2.14 

0.98382 

2.59 

0.99520 

1.70 

0.95543 

2.15 

0.98422 

2.6O 

O.  99534 

1.71 

0.95637 

2.16 

0.98461 

2.61 

0.99547 

1.72 

0.95728 

2.17 

0.98500 

2,62 

0.99560 

1.73 

0.95818 

2.18 

0.98537 

2.63 

0.99573 

1.74 

0.95907 

2.19 

0.98574 

2.64 

0.99585 

1.75 

0.95994 

2,20 

0.98610 

2.65 

0.99598 

1.76 

0.96080 

2.21 

0.98645 

2.66 

O.  99609 

1.77 

0.96164 

2.22 

0.98679 

2.67 

0.99621 

1.78 

0.96246 

2.23 

0.98713 

2.68 

0.99632 

1.79 

0.96327 

2.24 

0.98745 

2.69 

0.  99  643 

1.80 

0.96407 

2.25 

0.98778 

2.70 

0.99653 

1.81 

0.96485 

2.26 

0.988O9 

2.71 

0.99664 

1.82 

0.96562 

2.27 

0.98840 

2.72 

0.  99674 

1.83 

0.96638 

2.28 

0.98870 

2.73 

0.99683 

1.84 

0.96712 

2.29 

0.98899 

2.74 

0.99693 
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z 

G(z) 

z 

G(z) 

z 

G(z) 

2.75 

0.99702 

3.20 

0.99931 

3.65 

0.99987 

2.76 

0.99711 

3.21 

0.99934 

3.66 

0.99987 

2.77 

0.99720 

3.22 

0.99936 

3.67 

0.99988 

2.78 

0.99728 

3.23 

0.99938 

3.68 

0.99988 

2.79 

0.99736 

3.24 

0.99940 

3.69 

0.99989 

2.80 

0.99744 

3.25 

0.99942 

3.70 

0.99989 

2.81 

0.99752 

3.26 

0  .  99944 

3.71 

0.99990 

2.82 

0.99760 

3.27 

0.99946 

3.72 

0.99990 

2.83 

0.99767 

3.28 

0.99948 

3.73 

0.99990 

2.84 

0.99774 

3.29 

0.99950 

3.74 

0.99991 

2.85 

0.99781 

3.30 

0.99952 

3.75 

0.99991 

2.86 

0.99788 

3.31 

0.99953 

3.76 

0.99992 

2.87 

0.99795 

3.32 

0.99955 

3.77 

0.99992 

2.88 

0.99801 

3.33 

0.99957 

3.78 

0,99992 

2.89 

0.99807 

3.34 

0.99958 

3.79 

0.99992 

2.90 

0.99813 

3.35 

0.99960 

3.80 

0.99993 

2.91 

0.99819 

3.36 

0.99961 

3.81 

0.99993 

2.92 

0.99825 

3.37 

0.99962 

3.82 

0.99993 

2.93 

0.99831 

3.38 

0.99964 

3.83 

0.99994 

2.94 

0.99836 

3.39 

0.99965 

3.84 

0.99994 

2.95 

0.99841 

3.40 

0.99966 

3.85 

0.99994 

2.96 

0.99846 

3.41 

0.99968 

3.86 

0  .  99994 

2.97 

0.99851 

3.42 

0.99969 

3.87 

0.99995 

2.98 

0.99856 

3.43 

0.99970 

3.88 

0.99995 

2.99 

0.99861 

3.44 

0.99971 

3.89 

0.99995 

3.00 

0.99865 

3.45 

0.99972 

3.90 

0  .  99995 

3.01 

0.99869 

3.46 

0.99973 

3.91 

0.99995 

3.02 

0.99874 

3.47 

0.99974 

3.92 

0.99996 

3.03 

0.99878 

3.48 

0.99975 

3.93 

0.99996 

3.04 

0.99882 

3.49 

0.99976 

3.94 

0.99996 

3.05 

0.99886 

3.50 

0.99977 

3.95 

0.99996 

3.06 

0.99889 

3.51 

0.99978 

3.96 

0.99996 

3.07 

0.99893 

3.52 

0.99978 

3.97 

0.99996 

3.08 

0.99897 

3.53 

0.99979 

3.98 

0.99997 

3.09 

0.99900 

3.54 

0.99980 

3.99 

0.99997 

3.10 

0.99903 

3.55 

0.99981 

4.00 

0.99997 

3.11 

O.  99906 

3.56 

0.99981 

3.12 

0.99910 

3.57 

0.99982 

3.13 

0.99913 

3.58 

0.99983 

3.14 

0.99916 

3.59 

0.99983 

3.15 

0.99918 

3.60 

0.99984 

3.16 

0.99921 

3.61 

0.99985 

3.17 

0.99924 

3.62 

0.99985 

3.18 

0.99926 

3.63 

0.99986 

3.19 

0.99929 

3.64 

0.99986 
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APPENDIX    41 

CUMULATIVE  CHI-SQUARE 
DISTRIBUTION2 

1  Adapted  from  A.  Hald  and  S*  A.  Sinkbaek,  "A  table  of  percentage  points  of 
the   x2-distribution,J)   Skandinavisk  Aktuarietidskrift,  1950,   pp.   170—75.  By  per 
mission  of  the  authors  and  publishers. 

2  Entries  in  the  table  are  values  of  x|  where 
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V 

P 

0.0005 

0.001 

0.005 

0.01 

0.025 

0.05 

0.10 

0.20 

0.30 

0.40 

1 

0.0*393 

0.0*157 

0.0*393 

0.0»157 

0.0»982 

0.02393 

0.0158 

0.0642 

0.148 

0.275 

2 

o.onoo 

0.0^200 

0.0100 

0.0201 

0.0506 

0.103 

0.211 

0.446 

0.713 

1.02 

3 

0.0153 

0.0243 

0.0717 

0.115 

0.216 

0.352 

0.584 

1.00 

1.42 

1.87 

4 

0.0639 

0.0908 

0.207 

0.297 

0.484 

0.711 

1.06 

1.65 

2.19 

2.75 

5 

0.158 

0.210 

0.412 

0.554 

0.831 

1.15 

1.61 

2.34 

3.00 

3.66 

6 

0.299 

0.381 

0.676 

0.872 

1.24 

1.64 

2.20 

3.07 

3.83 

4.57 

7 

0.485 

0.598 

0.989 

1.24 

1.69 

2.17 

2.83 

3.82 

4.67 

5.49 

8 

0.710 

0.857 

1.34 

1.65 

2.18 

2.73 

3.49 

4.59 

5.53 

6.42 

g 

0.972 

1.15 

1.73 

2.09 

2.70 

3.33 

4.17 

5.38 

6.39 

7.36 

10 

1.26 

1.48 

2.16 

2.56 

3.25 

3.94 

4.87 

6.18 

7.27 

8.30 

11 

1.59 

1.83 

2.60 

3.05 

3.82 

4.57 

5.58 

6.99 

8.15 

9.24 

12 

1.93 

2.21 

3.07 

3.57 

4.40 

5.23 

6.30 

7.81 

0.03 

10.2 

13 

2.31 

2.62 

3.57 

4.11 

5.01 

5.89 

7.04 

8,63 

9.93 

11.1 

14 

2.70 

3.04 

4.07 

4.66 

5.63 

6.57 

7.79 

9.47 

10.8 

12.1 

15 

3.11 

3.48 

4.60 

5.23 

6.26 

7.26 

8.55 

10.3 

11.7 

13.0 

16 

3.54 

3.94 

5.14 

5.81 

6.91 

7:96 

9.31 

11.2 

12.6 

14.0 

17 

3.98 

4.42 

5.70 

6.41 

7.56 

8.67 

10.1 

12.0 

13.5 

14.9 

18 

4.44 

4.90 

6.26 

7,01 

8.23 

9.39 

10.9 

12.9 

14.4 

15.9 

19 

4.91 

5.41 

6.84 

7.63 

8.91 

10.1 

11.7 

13.7 

15.4 

16.9 

20 

5.40 

5.92 

7.43 

8.26 

9.59 

10.9 

12.4 

14.6 

16.3 

17.8 

21 

5.90 

6.45 

8.03 

8.90 

10.3 

11.6 

13.2 

15.4 

17.2 

18.8 

22 

6.40 

6.98 

8.64 

9.54 

11.0 

12.3 

14.0 

16.3 

18.1 

19.7 

23 

6.92 

7.53 

9.26 

10.2 

11.7 

13.1 

14.8 

17.2 

19,0 

20.7 

24 

7.45 

8.08 

9.89 

10.9 

12.4 

13.8 

15.7 

18.1 

19.0 

21.7 

25 

7.99 

8.65 

10.5 

11.5 

13.1 

14.6 

16.5 

18.9 

20.0 

22.0 

26 

8.54 

9.22 

11.2 

12.2 

13.8 

15.4 

17.3 

19.8 

21.8 

23.6 

27 

9.09 

9.80 

11.8 

12.9 

14.6 

16.2 

18.1 

20.7 

22.7 

24.5 

28 

9.66 

10.4 

12.5 

13.6 

15.3 

16.9 

18.9 

21.6 

23.6 

25.5 

29 

10.2 

11.0 

13.1 

14.3 

16.0 

17.7 

19.8 

22.5 

24.6 

26.5 

30 

10,8 

11.6 

13.8 

15.0 

16,8 

18.5 

20.6 

23.4 

25.5 

27.4 

31 

11.4 

12.2 

14.5 

15.7 

17.5 

19.3 

21.4 

24.3 

26.4 

28.4 

32 

12.0 

12.8 

15.1 

16.4 

18.3 

20.1 

22.3 

25.1 

27.4 

29.4 

33 

12.6 

13.4 

15.8 

17.1 

19.0 

20.9 

23.1 

26.0 

28.3 

30.3 

34 

13.2 

14.1 

16.5 

17.8 

19-8 

21.7 

24.0 

26.9 

29.2 

31.3 

35 

13.8 

14.7 

17.2 

18.5 

20.6 

22.5 

24.8 

27.8 

30.2 

32.3 

36 

14.4 

15.3 

17.9 

19.2 

21.3 

23.3 

25.6 

28.7 

31.1 

33.3 

37 

15.0 

16.0 

18.6 

20.0 

22.1 

24.1 

26.5 

29.6 

32.1 

34.2 

38 

15.6 

16.6 

19.3 

20.7 

22.9 

24.9 

27.3 

30.5 

33.0 

35.2 

39 

16,3 

17.3 

20.0 

21.4 

23.7 

25.7 

28.2 

31.4 

33.9 

36.2 

40 

16.9 

17.9 

20.7 

22.2 

24.4 

26.5 

29.1 

32.3 

34.9 

37.1 

41 

17.5 

18.6 

21.4 

22.9 

25.2 

27.3 

29.9 

33.3 

35.8 

38.1 

42 

18.2 

19.2 

22.1 

23.7 

26.0 

28.1 

30.8 

34.2 

36.8 

39.1 

43 

18.8 

19.9 

22.9 

24.4 

26.8 

29.0 

31.6 

35.1 

37.7 

40.0 

44 

19.5 

20.6 

23.6 

25.1 

27.6 

29.8 

32.5 

36,0 

38.6 

41.0 

45 

20.1 

21,3 

24.3 

25.9 

28.4 

30.6 

33.4 

36.9 

39.6 

42.0 

46 

20.8 

21.9 

25.0 

26.7 

29.2 

31.4 

34.2 

37.8 

40.5 

43.0 

47 

21.5 

22.6 

25,8 

27.4 

30.0 

32.3 

35.1 

38.7 

41.5 

43.9 

48 

22.1 

23.3 

26.5 

28.2 

30.8 

33.1 

35.9 

39.6 

42.4 

44.0 

49 

22.8 

24.0 

27.2 

28.9 

31.6 

33.9 

36.8 

40.5 

43.4 

45.9 

50 

23.5 

24.7 

28.0 

29.7 

32.4 

34.8 

37.7 

41.4 

44.3 

46.9 

[524] 


V 

P 

0.50 

0.60 

0.70 

0.80 

0.90 

0.95 

0.975 

0.99 

0.995 

0.999 

0.9995 

1 

0.455 

0.708 

1.07 

1.64 

2.71 

3.84 

5.02 

6.63 

7.88 

10.8 

12.1 

2 

1.39 

1.83 

2.41 

3.22 

4.61 

5.99 

7.38 

9.21 

10.6 

13.8 

15.2 

3 

2.37 

2.95 

3.67 

4.64 

6.25 

7.81 

9.35 

11.3 

12.8 

16.3 

17.7 

4 

3.36 

4.04 

4.88 

5.99 

7.78 

9.49 

11.1 

13.3 

14.9 

18.5 

20.0 

5 

4.35 

5.13 

6.06 

7.29 

9.24 

11.1 

12.8 

15.1 

16,7 

20.5 

22.1 

6 

5.35 

6.21 

7.23 

8.56 

10.6 

12.6 

14.4 

16.8 

18.5 

22.5 

24.1 

7 

6.35 

7.28 

8.38 

9.80 

12.0 

14.1 

16.0 

18.5 

20.3 

24.3 

26.0 

8 

7.34 

8.35 

9.52 

11.0 

13.4 

15.5 

17.5 

20.1 

22,0 

26.1 

27.9 

9 

8.34 

9.41 

10.7 

12.2 

14.7 

16.9 

ia.o 

21.7 

23.6 

27.9 

29.7 

10 

9.34 

10.5 

11.8 

13.4 

16.0 

18.3 

20.5 

23.2 

25.2 

29.6 

31.4 

11 

10.3 

11.5 

12.9 

14.6 

17.3 

19.7 

21.9 

24.7 

26.8 

31.3 

33.1 

12 

11.3 

12.6 

14.0 

15.8 

18.5 

21.0 

23,3 

26.2 

28.3 

32.9 

34.8 

13 

12.3 

13.6 

15.1 

17.0 

19.8 

22.4 

24.7 

27.7 

29.8 

34.5 

36.5 

14 

13.3 

14.7 

16.2 

18.2 

21.1 

23.7 

26.1 

29.1 

31.3 

36.1 

38.1 

15 

14.3 

15.7 

17.3 

19.3 

22.3 

25.0 

27.5 

30.6 

32.8 

37.7 

39.7 

16 

15.3 

16.8 

18.4 

20.5 

23.5 

26.3 

28.8 

32.0 

34.3 

39.3 

41.3 

17 

16.3 

17.8 

19.5 

21.6 

24.8 

27.6 

30.2 

33.4 

35.7 

40.8 

42.9 

18 

17.3 

18.9 

20.6 

22.8 

26.0 

28.9 

31.5 

34.8 

37.2 

42.3 

44.4 

19 

18.3 

19.9 

21.7 

23.9 

27.2 

30.1 

32.9 

36.2 

38.6 

43.8 

46.0 

20 

19.3 

21.0 

22.8 

25.0 

28.4 

31.4 

34.2 

37.6 

40.0 

45.3 

47.5 

21 

20.3 

22.0 

23.9 

26.2 

29.6 

32.7 

35.5 

38.9 

41.4 

46.8 

49.0 

22 

21.3 

23.0 

24.9 

27.3 

30.8 

33.9 

36.8 

40.3 

42.8 

48.3 

50.5 

23 

22.3 

24.1 

26.0 

28.4 

32.0 

35.2 

38.1 

41.6 

44.2 

49,7 

52.0 

24 

23.3 

25.1 

27.1 

29.6 

33.2 

36.4 

39.4 

43.0 

45.6 

51.2 

53.5 

25 

24.3 

26.1 

28.2 

30.7 

34.4 

37.7 

40.6 

44.3 

46.9 

52.6 

54.9 

26 

25.3 

27.2 

29.2 

31.8 

35.6 

38.9 

41.9 

45.6 

48.3 

54,1 

56.4 

27 

26.3 

28.2 

30.3 

32.9 

36.7 

40.1 

43.2 

47.0 

49.6 

55.5 

57.9 

28 

27.3 

29.2 

31.4 

34,0 

37.9 

41.3 

44.5 

48.3 

51.0 

56.9 

59.3 

29 

28.3 

30.3 

32.5 

35.1 

39.1 

42.6 

45.7 

49.6 

52.3 

58.3 

60.7 

30 

29.3 

31.3 

33.5 

36.3 

40.3 

43.8 

47.0 

50.9 

53.7 

59.7 

62.2 

31 

30.3 

32.3 

34.6 

37.4 

41.4 

45.0 

48.2 

52.2 

55.0 

61.1 

63.6 

32 

31.3 

33.4 

35.7 

38.5 

42.6 

46.2 

49.5 

53.5 

56.3 

62.5 

65.0 

33 

32.3 

34.4 

36.7 

39.6 

43.7 

47.4 

50.7 

54.8 

57.6 

63.9 

66.4 

34 

33.3 

35.4 

37.8 

40.7 

44.9 

48.6 

52.0 

56.1 

59.0 

65.2 

67.8 

35 

34.3 

36.5 

38.9 

41.8 

46.1 

49.8 

53.2 

57.3 

60.3 

66.6 

69.2 

36 
37 

35.3 
36  3 

37.5 
38.5 

39,9 
41.0 

42.9 
44.0 

47.2 
48.4 

51.0 
52.2 

54.4 
55.7 

58.6 
59.9 

61.6 
62,9 

68.0 
69.3 

70.6 
72.0 

38 
39 
40 

37.3 
38.3 
39.3 

39.6 
40.6 
41.6 

42.0 
43.1 
44.2 

45.1 
46.2 
47.3 

49.5 
50.7 
51.8 

53.4 
54.6 
55.8 

56.9 
58.1 
59.3 

61.2 
62.4 
63.7 

64.2 
65.5 
66.8 

70.7 
72.1 
73.4 

73.4 
74.7 
76.1 

41 
42 
43 
44 
45 

40.3 
41.3 
42.3 
43.3 
44.3 

42.7 
43.7 
44.7 
45.7 
46.8 

45.2 
46.3 
47.3 
48.4 
49.5 

48.4 
49.5 
50.5 
51.6 
52.7 

52.9 
54.1 
55.2 
56.4 
57.5 

56.9 
58.1 
59.3 
60.5 
61.7 

60.6 
61.8 
63.0 
64.2 
65.4 

65.0 
66.2 
67.5 
68.7 
70.0 

68.1 
69.3 
70.6 
71.9 
73.2 

74.7 
76.1 
77.4 
78.7 
80.1 

77.5 
78.8 
80.2 
81.5 
82.9 

46 
47 
48 
49 
50 

45.3 
46.3 
47.3 
48.3 
49.3 

47.8 
48.8 
49.8 
50.9 
51.9 

50.5 
51.6 
52.6 
53.7 
54.7 

53.8 
54.9 
56.0 
57.1 
58.2 

58.6 
59.8 
60.9 
62.0 
63.2 

62.8 
64.0 
65.2 
66.3 
67.5 

66.6 
67.8 
69.0 
70.2 
71.4 

71.2 
72.4 
73,7 
74.9 
76.2 

74.4 
75.7 
77.0 

78.2 
79.5 

81.4 
82.7 
84.0 
85.4 
86.7 

84.2 
85.6 
86.9 
88.2 
89.6 

[525] 


If 

V 

0.0005 

0.001 

0.005 

0.01 

0.025 

0.05 

0.10 

0.20 

0.30 

0.40 

0.50 

51 

24.1 

25.4 

28.7 

30.5 

33.2 

35.6 

38.6 

42.4 

45.3 

47.8 

50.3 

52 

24.8 

26.1 

29.5 

31.2 

34.0 

36.4 

39.4 

43.3 

46.2 

48.8 

51.3 

53 

25.5 

26.8 

30.2 

32.0 

34.8 

37.3 

40.3 

44.2 

47.2 

49.8 

52.3 

54 

26.2 

27.5 

31.0 

32.8 

35.6 

38.1 

41.2 

45.1 

48.1 

50.8 

53.3 

55 

26.9 

28.2 

31.7 

33.6 

36.4 

39.0 

42.1 

46.0 

49.1 

51.7 

54.3 

56 

27.6 

28.9 

32.5 

34,3 

37.2 

39.8 

42.9 

47.0 

50.0 

52.7 

55.3 

57 

28.2 

29.6 

33.2 

35.1 

38.0 

40.6 

43.8 

47.9 

51.0 

53.7 

56.3 

58 

28.9 

30.3 

34.0 

35.9 

38.8 

41.5 

44,7 

48.8 

51.9 

54.7 

57.3 

59 

29.6 

31.0 

34.8 

36.7 

39.7 

42.3 

45.6 

49.7 

52.9 

55.6 

58.3 

60 

30.3 

31.7 

35.5 

37.5 

40.5 

43.2 

46.5 

50.6 

53.8 

56,6 

59.3 

61 

31.0 

32.5 

36.3 

38.3 

41.3 

44.0 

47.3 

51.6 

54.8 

57.6 

60.3 

62 

31,7 

33.2 

37.1 

39.1 

42.1 

44.9 

48,2 

52.5 

55.7 

58.6 

61.3 

63 

32.5 

33.9 

37.8 

39.9 

43.0 

45.7 

49.1 

53.4 

56.7 

59.6 

62.3 

64 

33.2 

34.6 

38.6 

40.6 

43.8 

46.6 

50.0 

54.3 

57.6 

60.5 

63.3 

65 

33.9 

35.4 

39.4 

41.4 

44.6 

47.4 

50.9 

55.3 

58.6 

61.5 

64.3 

66 

34.6 

36.1 

40.2 

42.2 

45.4 

48.3 

51.8 

56.2 

59.5 

62.5 

65.3 

67 

35.3 

36.8 

40.9 

43.0 

46.3 

49.2 

52.7 

57.1 

60.5 

63.5 

66.3 

68 

36.0 

37.6 

41.7 

43.8 

47.1 

50.0 

53.5 

58.0 

61.4 

64.4 

67.3 

69 

36.7 

38.3 

42.5 

44.6 

47.9 

50.9 

54.4 

59.0 

62.4 

65.4 

68.3 

70 

37.5 

39.0 

43.3 

45.4 

48.8 

51.7 

55.3 

59.9 

63.3 

66.4 

69.3 

71 

38.2 

39.8 

44.1 

46.2 

49.6 

52.6 

56.2 

60.8 

64.3 

67.4 

70.3 

72 

38.9 

40.5 

44.8 

47.1 

50.4 

53.5 

57.1 

61.8 

65.3 

68.4 

71.3 

73 

39.6 

41.3 

45.6 

47.9 

51.3 

54.3 

58.0 

62.7 

66.2 

69.3 

72.3 

74 

40.4 

42.0 

46.4 

48.7 

52.1 

55.2 

58.9 

63.6 

67.2 

70.3 

73.3 

75 

41.1 

42.8 

47.2 

49.5 

52.9 

56.1 

59.8 

64.5 

68.1 

71.3 

74.3 

76 

41.8 

43.5 

48.0 

50.3 

53.8 

56.9 

60.7 

65.5 

69.1 

72.3 

75.3 

77 

42.6 

44.3 

48.8 

51.1 

54.6 

57.8 

61.6 

66.4 

70.0 

73.2 

76.3 

78 

43.3 

45.0 

49.6 

51.9 

55.5 

58.7 

62.5 

67.3 

71.0 

74.2 

77.3 

79 

44.1 

45.8 

50.4 

52.7 

56.3 

59.5 

63.4 

68.3 

72.0 

75.2 

78.3 

80 

44.8 

46.5 

51.2 

53.5 

57.2 

60.4 

64.3 

69.2 

72.9 

76.2 

79.3 

81 

45.5 

47.3 

52.0 

54.4 

58.0 

61.3 

65.2 

70.1 

73.9 

77.2 

80.3 

82 

46.3 

48.0 

52.8 

55.2 

58.8 

62.1 

66.1 

71.1 

74.8 

78.1 

81.3 

83 

47.0 

48.8 

53.6 

56.0 

59.7 

63.0 

67.0 

72.0 

75.8 

79.1 

82.3 

84 

47.8 

49.6 

54.4 

56.8 

60.5 

63.9 

67.9 

72.9 

76.8 

80.1 

83.3 

85 

48.5 

50.3 

55.2 

57.6 

61.4 

64.7 

68.8 

73.9 

77.7 

81.1 

84.3 

86 

49.3 

51.1 

56.0 

58.5 

62.2 

65.6 

69.7 

74.8 

78.7 

82.1 

85.3 

87 

50.0 

51.9 

56.8 

59.3 

63.1 

66.5 

70.6 

75.7 

79.6 

83.0 

86.3 

88 

50.8 

52.6 

57.6 

60.1 

63.9 

67.4 

71.5 

76.7 

80.6 

84.0 

87.3 

89 

51.5 

53.4 

58.4 

60.9 

64.8 

68.2 

72.4 

77.6 

81.6 

85.0 

88.3 

90 

52.3 

54,2 

59.2 

61.8 

65.6 

69.1 

73.3 

78.6 

82.5 

86.0 

89.3 

91 

53.0 

54.9 

60.0 

62.6 

66.5 

70.0 

74.2 

79.5 

83.5 

87.0 

90.3 

92 

53.8 

55.7 

60.8 

63.4 

67.4 

70-9 

75.1 

80.4 

84.4 

88.0 

91.3 

93 

54.5 

56.5 

61.6 

64.2 

68.2 

71.8 

76.0 

81.4 

85.4 

88.9 

92.3 

94 

55.3 

57.2 

62.4 

65.1 

69.1 

72,6 

76.9 

82.3 

86,4 

89.9 

93.3 

95 

56.1 

58.0 

63.2 

65.9 

69.9 

73.5 

77.8 

83.2 

87.3 

90.9 

94.3 

96 

56.8 

58.8 

64.1 

66.7 

70.8 

74.4 

78.7 

84.2 

88.3 

91.9 

95.3 

97 

57.6 

59.6 

64.9 

67.6 

71.6 

75.3 

79.6 

85.1 

89.2 

92.9 

96.3 

98 

58.4 

60.4 

65.7 

68.4 

72.5 

76,2 

80.5 

86.1 

90.2 

93.8 

97.3 

99 

59.1 

61.1 

66.5 

69.2 

73.4 

77.0 

81.4 

87.0 

91.2 

94.8 

98.3 

100 

59.9 

61.9 

67.3 

70.1 

74.2 

77.9 

82.4 

87.9 

92.1 

95.8 

99.3 

[5261 


V 

P 

0.60 

0.70 

0.80 

0.90 

0.95 

0.975 

0.99 

0.995 

0.999 

0.9995 

51 
52 
53 
54 
55 

52.9 
53.9 
55.0 
56.0 
57.0 

55.8 
56.8 
57.9 
58.9 
60.0 

59.2 
60.3 
61.4 
62,5 
63.6 

64.3 
65.4 
66.5 
67.7 
68.8 

68.7 
69.8 
71.0 
72.2 
73.3 

72.6 
73.8 
75.0 
76.2 
77.4 

77.4 
78.6 
79.8 
81.1 
82.3 

80.7 
82.0 
83.3 
84.5 
85.7 

88.0 
89,3 
90.6 
91.9 
93.2 

90.9 
92.2 
93.5 
94.8 
96.2 

56 
57 
58 

58.0 
59,1 
60.1 

61.0 
62.1 
63.1 

64.7 
65.7 
66.8 

69.9 
71.0 
72.2 

74.5 
75.6 
76.8 

78.6 
79.8 
80,9 

83.5 

84.7 
86.0 

87.0 

88.2 
89.5 

94.5 
95.8 
97.0 

97.5 
98.8 
100.1 

59 
60 

61.1 
62.1 

64.2 
65.2 

67.9 
69.0 

73.3 

74.4 

77.9 
79.1 

82.1 
83.3 

87.2 
88.4 

90.7 
92.0 

98.3 

99.6 

101.4 
102.7 

61 

63.2 

66.3 

70.0 

75.5 

80.2 

84.5 

89.6 

93.2 

100.9 

104.0 

62 

64.2 

67.3 

71.1 

76.6 

81.4 

85.7 

90,8 

94.4 

102.2 

105.3 

63 

65.2 

68,4 

72.2 

77.7 

82.5 

86.8 

92.0 

95.6 

103.4 

106  6 

64 

66.2 

69.4 

73.3 

78.9 

83.7 

88.0 

93.2 

96.9 

104.7 

107.9 

65 

67.2 

70.5 

74.4 

80.0 

84.8 

89.2 

94.4 

98.1 

106.0 

109.2 

66 

68.3 

71.5 

75.4 

81.1 

86.0 

90.3 

95.6 

99.3 

107.3 

110.5 

67 

69.3 

72.6 

76.5 

82.2 

87.1 

91.5 

96.8 

100.6 

108.5 

111.7 

68 

70.3 

73.6 

77.6 

83.3 

88.3 

92.7 

98.0 

101.8 

109.8 

113.0 

69 

71.3 

74.6 

78.6 

84.4 

89.4 

93.9 

99.2 

103.0 

111.1 

114.3 

70 

72.4 

75.7 

79.7 

85.5 

90.5 

95.0 

100.4 

104.2 

112.3 

115.6 

71 

73.4 

76.7 

80.8 

86.6 

91.7 

96.2 

101.6 

105.4 

113.6 

116.9 

72 

74.4 

77.8 

81.9 

87.7 

92.8 

97.4 

102.8 

106.6 

114.8 

118.1 

73 

75.4 

78.8 

82.9 

88.8 

93.9 

98.5 

104.0 

107.9 

116.1 

119.4 

74 

76.4 

79.9 

84.0 

90.0 

95.1 

99.7 

105.2 

109.1 

117.3 

120.7 

75 

77.5 

80.9 

85.1 

91.1 

96.2 

100.8 

106.4 

110.3 

118.6 

121,9 

76 

78.5 

82.0 

86.1 

92.2 

97.4 

102.0 

107.6 

111.5 

119.9 

123.2 

77 

79.5 

83.0 

87.2 

93.3 

98.5 

103.2 

108.8 

112.7 

121.1 

124.5 

78 

80.5 

84.0 

88.3 

94.4 

99.6 

104.3 

110.0 

113.9 

122.3 

125.7 

7Q 

81.5 

85.1 

89.3 

95.5 

100.7 

105.5 

111.1 

115.1 

123.6 

127.0 

80 

82.0 

86.1 

90.4 

96.6 

101.9 

106.6 

112.3 

116.3 

124.8 

128.3 

81 

83.6 

87.2 

91.5 

97.7 

103.0 

107.8 

113.5 

117.5 

126.1 

129.5 

82 

84.6 

88.2 

92.5 

98.8 

104.1 

108.9 

114.7 

118.7 

127.3 

130.8 

83 

85.6 

89.2 

93.6 

99.9 

105.3 

110.1 

115.9 

119.9 

128.6 

132.0 

84 

86.6 

90.3 

94.7 

101.0 

106.4 

111.2 

117.1 

121.1 

129.8 

133,3 

85 

87.7 

91.3 

95.7 

102.1 

107.5 

112.4 

118.2 

122.3 

131.0 

134.5 

86 

88.7 

92.4 

96.8 

103.2 

108.6 

113.5 

119.4 

123.5 

132.3 

135.8 

87 

89.7 

93.4 

97.9 

104.3 

109.8 

114.7 

120.6 

124.7 

133.5 

137.0 

88 

90.7 

94.4 

98.9 

105.4 

110.9 

115.8 

121.8 

125.9 

134.7 

138,3 

89 

91.7 

95.5 

100.0 

106.5 

112.0 

117.0 

122.9 

127.1 

136.0 

139.5 

90 

92.8 

96.5 

101.1 

107.6 

113.1 

118.1 

124.1 

128.3 

137.2 

140.8 

91 

93.8 

97.6 

102.1 

108.7 

114.3 

119.3 

125.3 

129.5 

138.4 

142.0 

92 

94.8 

98.6 

103.2 

109.8 

115.4 

120.4 

126.5 

130.7 

139.7 

143.3 

93 

95.8 

99.6 

104.2 

110.9 

116.5 

121.6 

127.6 

131.9 

140.9 

144.5 

94 

96,8 

100.7 

105.3 

111.9 

117.6 

122.7 

128.8 

133.1 

142.1 

145.8 

95 

97.9 

101.7 

106.4 

113.0 

118.8 

123.9 

130.0 

134.2 

143.3 

147,0 

96 

98.9 

102.8 

107.4 

114.1 

119.9 

125.0 

131.1 

135.4 

144.6 

148.2 

97 

99.9 

103.8 

108.5 

115.2 

121.0 

126.1 

132.3 

136.6 

145.8 

149.5 

98 

100.9 

104.8 

109.5 

116.3 

122,1 

127.3 

133.5 

137.8 

147.0 

150.7 

99 

101.9 

105.9 

110.6 

117.4 

123.2 

128.4 

134.6 

139.0 

148.2 

151.0 

100 

102.9 

106.9 

111.7 

118.5 

124.3 

129.6 

135.8 

140.2 

149.4 

153.2 
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APPENDIX     51 


CUMULATIVE  /-DISTRIBUTION 


V 

p 

0.75 

0.8O 

O.85 

O.90 

O.95 

0,975 

0.995 

O.9995 

1 

1.OOO5 

1.376 

1.963 

3.078 

6.314 

12.706 

63.657 

636.619 

2 

0.816 

1.061 

1.386 

1.886 

2.920 

4.3O3 

9.925 

31.598 

3 

0.765 

0.978 

1.25O 

1.638 

2.353 

3.182 

5.841 

12.941 

4 

0.741 

0.941 

1.190 

1.533 

2.132 

2.776 

4.604 

8.61O 

5 

0.727 

0.92O 

1.156 

1.476 

2.O15 

2.571 

4.032 

6.859 

6 

0.718 

0.906 

1.134 

1.440 

1  .943 

2.447 

3.707 

5.959 

7 

0.711 

O.896 

1.119 

1.415 

1.895 

2.365 

3.499 

5.405 

8 

0.7O6 

0.889 

1.108 

1.397 

1.86O 

2.306 

3.355 

5.041 

9 

O.7O3 

O.883 

1.10O 

1.383 

1.833 

2.262 

3.250 

4.781 

10 

0.7OO 

O.879 

1.093 

1.372 

1.812 

2.228 

3.169 

4.587 

11 

0.697 

O.876 

1.088 

1.363 

1.796 

2.201 

3.106 

4.437 

12 

O.695 

O.873 

1.O83 

1.356 

1.782 

2.179 

3.055 

4.318 

13 

O.694 

O.87O 

1.079 

1.35O 

1.771 

2.160 

3,012 

4.221 

14 

0.692 

O.868 

1.076 

1.345 

1.761 

2.145 

2.977 

4.140 

15 

0.691 

0.866 

1.074 

1.341 

1.753 

2.131 

2.947 

4.073 

16 

0.690 

O.866 

1.071 

1.337 

1.746 

2.120 

2.921 

4.015 

17 

O.689 

O.863 

1.O69 

1.333 

1.740 

2.110 

2.898 

3.965 

18 

O.688 

O.862 

1.O67 

1.33O 

1.734 

2.101 

2.878 

3.922 

19 

0.688 

0.861 

1.066 

1.328 

1.729 

2.O93 

2.861 

3.883 

20 

0.687 

0.86O 

1.064 

1.325 

1.725 

2.086 

2.845 

3.850 

21 

O.686 

0.859 

1.063 

1.323 

1.721 

2.080 

2.831 

3.819 

22 

O.686 

0.858 

1.061 

1.321 

1.717 

2.074 

2.819 

3.792 

23 

0.685 

0.858 

1.060 

1.319 

1.714 

2.069 

2.8O7 

3.767 

24 

O.685 

0.857 

1.059 

1.318 

1.711 

2.064 

2.797 

3.745 

25 

O.684 

O.856 

1.058 

1.316 

1.7O8 

2.06O 

2.787 

3.725 

26 

0.684 

0.856 

1.058 

1.315 

1.706 

2.056 

2.779 

3.7O7 

27 

0.684 

0.855 

1.057 

1.314 

1.7O3 

2.O52 

2.771 

3,690 

28 

0.683 

0.855 

1.056 

1.313 

1.7O1 

2.048 

2,763 

3.674 

29 

O.683 

0.854 

1.O55 

1.311 

1.699 

2.045 

2.756 

3.659 

30 

O.683 

0.854 

1.O55 

1.310 

1.697 

2.042 

2.750 

3.646 

35 

0.682 

O.852 

1.052 

1.306 

1.690 

2.03O 

2.724 

3.591 

40 

O.681 

0.851 

1.050 

1.303 

1.684 

2.021 

2.7O4 

3.551 

45 

O.680 

0.85O 

1.O48 

1.3O1 

1.680 

2.014 

2.69O 

3.52O 

50 

O.68O 

0.849 

1.047 

1.299 

1.676 

2.0O8 

2.678 

3.496 

55 

O.679 

0.849 

1.047 

1.297 

1.673 

2.004 

2.669 

3.476 

60 

O.679 

0.848 

1.046 

1.296 

1.671 

2.0OO 

2.660 

3.460 

70 

0.678 

0.847 

1.045 

1.294 

1.667 

1.994 

2.648 

3.435 

80 

0.678 

0.847 

1.044 

1.293 

1.665 

1,990 

2.638 

3.416 

90 

0.678 

0.846 

1.043 

1.291 

1.662 

1.987 

2.632 

3.402 

100 

0.677 

0.846 

1.042 

1.290 

1.661 

1.984 

2.626 

3.39O 

20O 

0.676 

0.844 

1.O39 

1.286 

1.653 

1.972 

2.601 

3.34O 

300 

O.676 

0.843 

1.O38 

1.285 

1.650 

1.968 

2.592 

3.323 

400 

0.676 

0.843 

1.038 

1.284 

1.649 

1.966 

2.588 

3.315 

500 

0.676 

0.843 

1.O37 

1.284 

1.648 

1.965 

2.586 

3.31O 

1OOO 

O.675 

0.842 

1,037 

1.283 

1.647 

1.962 

2.581 

3.301 

CO 

0  .  67449 

0.84162 

1.O3643 

1.28155 

1.64485 

1  .  95996 

2.57582 

3.29053 

1  Partly  from  Table  III  of  R.  A.  Fisher  and  Frank  Yates,  Statistical  Tables  for 
Biological,  Agricultural  and  Medical  Research,  third  ed.,  Oliver  and  Boyd,  Edin 
burgh,  1948.  By  permission  of  the  authors  and  publishers. 

2  Entries  in  the  table  are  values  of  tp  "where 


APPENDIX    61 

CUMULATIVE  F-DISTRIBUTION 

i  Reproduced  from  Table  A-7c  of  W.  J.  Dixon  and  F.  J.  Massey,  Introduction 
to  Stat^st^cal  Analysis,  second  ed0  McGraw-Hill  Book  Company,  Inc.,  New 
York,  1957.  By  permission  of  the  authors  and  publishers.  However,  since  most 
of  the  values  in  Dixon  and  Massey  were  extracted  from  other  publications, 
permission  was  also  requested  of  the  primary  sources  noted  below.  In  each  case' 
permission  was  granted  to  reproduce  the  needed  material. 

(a)    All  values  for  v19  vz  equal  to   50,   100,   200,   and  500  are  from  A.  Hald, 
Statistical  Tables  and  Formulas,  John  Wiley  and  Sons,  Inc.,  New  York 


(b)  For  cumulative  proportions  .5,   ,75,   .9,   .95,  .975,  .99,  and  .995,  most  of 
the  values  are  from  M.   Merrington  and   C.   M.  Thompson,   "Tables  of 
percentage   points   of  the   inverted    beta    <JF}    distribution/'    Biometrika 
Vol.  33,  Part  I,  April,  1943,  pp.  74-87. 

(c)  For  cumulative  proportions  .999,  the  values  are  from  C.  C.  Colcord  and 
L.  S.  Deming,  "The  one-tenth  percent  level  of  Z,"  Sankhya,  Vol.  2,  Part  4, 
Dec.,  1936,  pp.  423-24. 

(d)  As  noted  in   Dixon   and   Massey,   the   remaining  values  were  found  by 
forming  reciprocals  or  by  interpolation. 

2  Entries  in  the  table  are  values  of  Fp  where 
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1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

^2 

1 

.0005 

.0662 

.0350 

.0238 

,0294 

.016 

.022 

.027 

.032 

.036 

.039 

.042 

,045 

.001 

.0525 

.ono 

.0260 

.013 

.021 

.028 

.034 

.039 

.044 

.048 

.051 

.054 

.005 

.0462 

-0251 

.018 

.032 

.044 

.054 

.062 

.068 

.073 

.078 

.082 

.085 

.010 

.0325 

.010 

.029 

.047 

.062 

.073 

.082 

.089 

.095 

,100 

.104 

.107 

.025 

.0215 

.026 

.057 

.082 

.100 

.113 

.124 

.132 

.139 

.144 

.149 

.153 

.05 

.0262 

.054 

.099 

.130 

.151 

.167 

.179 

.188 

.195 

.201 

.207 

.211 

.10 

.025 

.117 

.181 

.220 

.246 

.265 

.279 

.289 

.298 

.304 

.310 

.315 

.25 

.172 

.389 

.494 

.553 

.591 

.617 

.637 

.650 

.661 

.670 

.680 

.684 

.50 

1.00 

1.50 

1.71 

1.82 

1.89 

1.94 

1.98 

2.00 

2.03 

2.04 

2.05 

2.07 

.75 

5.83 

7.50 

8.20 

8.58 

8.82 

8.98 

9.10 

9.19 

9.26 

9.32 

9.36 

9.41 

.90 

39.9 

49.5 

53.6 

55.8 

57.2 

58.2 

58.9 

59.4 

59.9 

60.2 

60.5 

60.7 

.95 

161 

200 

216 

225 

230 

234 

237 

239 

241 

242 

243 

244 

.975 

648 

800 

864 

900 

922 

937 

948 

957 

963 

969 

973 

977 

.99 

4051 

500i 

5401 

562i 

5761 

5861 

5931 

5981 

6021 

6061 

60S1 

611i 

.995 

1622 

2002 

2162 

2252 

2312 

2342 

2372 

2392 

2412 

2422 

2432 

2442 

.999 

4063 

5003 

5403 

5623 

5763 

5863 

5933 

5983 

6023 

6063 

6093 

611 

.9995 

162* 

200* 

216* 

225* 

2314 

234* 

2374 

2394 

2414 

242* 

243* 

244* 

2 

.0005 

.0650 

.0350 

.0=42 

.011 

.020 

.029 

.037 

.044 

.050 

.056 

.061 

.065 

.001 

.0520 

.0210 

.0268 

.016 

.027 

.037 

.046 

.054 

.061 

.067 

.072 

.077 

.005 

.0*50 

.0250 

.020 

.038 

.055 

,069 

.081 

.091 

.099 

.106 

.112 

.118 

.01 

.0320 

.010 

.032 

.056 

.075 

.092 

.105 

.116 

.125 

.132 

.139 

.144 

.025 

.0213 

.026 

.062 

.094 

.119 

.138 

.153 

.165 

.175 

.183 

.190 

.196 

.05 

.0^50 

.053 

.105 

.144 

.173 

.194 

.211 

.224 

.235 

.244 

.251 

.257 

,10 

.020 

.111 

.183 

.231 

.265 

.289 

.307 

.321 

.333 

,342 

.350 

.356 

,25 

.133 

.333 

.439 

.500 

.540 

.568 

.588 

.604 

.616 

.626 

.633 

.641 

.50 

.667 

1.00 

1.13 

1.21 

1.25 

1.28 

1.30 

1.32 

1.33 

1,34 

1.35 

1.36 

.75 

2.57 

3.00 

3.15 

3.23 

3.28 

3.31 

3.34 

3.35 

3.37 

3.38 

3.39 

3.39 

.90 

8.53 

9.00 

9.16 

9.24 

9.29 

9.33 

9.35 

9.37 

9.38 

9.39 

9.40 

9.41 

.95 

18.5 

19.0 

19.2 

19.2 

19.3 

19.3 

19.4 

19.4 

19.4 

19.4 

19.4 

19.4 

.975 

38.5 

39.0 

39.2 

39.2 

39.3 

39.3 

39.4 

39.4 

39.4 

39.4 

39.4 

39.4 

.99 

98.5 

99.0 

99.2 

99.2 

99.3 

99.3 

99.4 

99.4 

99.4 

99.4 

99.4 

99.4 

.995 

198 

199 

199 

199 

199 

199 

199 

199 

199 

199 

199 

199 

.999 

998 

999 

999 

999 

999 

999 

999 

999 

999 

999 

999 

999 

.9995 

2001 

200i 

2001 

2001 

2001 

2001 

2001 

200i 

2001 

20Qi 

2001 

2001 

3 

.0005 

.0«46 

.0350 

.0244 

.012 

.023 

.033 

.043 

.052 

.060 

.067 

.074 

.079 

.001 

.0*19 

.0210 

-0271 

.018 

.030 

.042 

.053 

.063 

.032 

.079 

.086 

.093 

.005 

.0*46 

.0250 

.021 

.041 

.060 

.077 

.092 

.104 

.115 

.124 

.132 

-138 

.01 

.On9 

.010 

.034 

.060 

.083 

.102 

.118 

.132 

.143 

.153 

.161 

.168 

.025 

.0212 

.026 

.065 

.100 

.129 

.152 

.170 

.185 

.197 

.207 

.216 

.224 

.05 

.0246 

.052 

.108 

.152 

.185 

.210 

.230 

.246 

.259 

.270 

.279 

.287 

.10 

.019 

.109 

,185 

.239 

.276 

.304 

.325 

.342 

.356 

.367 

.376 

.384 

.25 

.122 

.317 

.424 

.489 

.531 

.561 

.582 

.600 

.613 

.624 

.633 

.641 

.50 

.585 

.881 

1.00 

1.06 

1.10 

1.13 

1.15 

1.16 

1.17 

1.18 

1.19 

1.20 

.75 

2.02 

2.28 

2.36 

2.39 

2.41 

2.42 

2.43 

2.44 

2.44 

2.44 

2.45 

2.45 

.90 

5.54 

5.46 

5.39 

5.34 

5.31 

5.28 

5.27 

5.25 

5.24 

5.23 

5.22 

5.22 

.95 

10.1 

9.55 

9.28 

9.12 

9.01 

8.94 

8.89 

8,85 

8.81 

8.79 

8.76 

8.74 

.975 

17.4 

16.0 

15.4 

15.1 

14.9 

14.7 

14.6 

14.5 

14.5 

14.4 

14.4 

14.3 

.99 

34.1 

30.8 

29.5 

28.7 

28.2 

27.9 

27.7 

27.5 

27.3 

27,2 

27.1 

27.1 

.995 

55.6 

49.8 

47.5 

46.2 

45.4 

44,3 

44.4 

44.1 

43.9 

43.7 

43.5 

43.4 

.999 

167 

149 

141 

137 

135 

133 

132 

131 

130 

129 

129 

128 

.9995 

266 

237 

225 

218 

214 

211 

209 

208 

207 

206 

204 

204 

Read  .0^56  0    .00056,  200*  as  2000,  1624  as  1620000,  etc. 


,t 

P    ^\ 

15 

20 

24 

30 

40 

50 

60 

100 

120 

200 

500 

CO 

1 

.0005 

.051 

.058 

.062 

.066 

.069 

.072 

.074 

.077 

,078 

.080 

.081 

.083 

.001 

.060 

.067 

.071 

.075 

.079 

.082 

.084 

.087 

.088 

.089 

.091 

.092 

.005 

.093 

.101 

.105 

.109 

.113 

.116 

.118 

.121 

.122 

.124 

.126 

.127 

.01 

.115 

.124 

.128 

.132 

.  137 

.139 

.141 

.145 

.146 

.148 

.150 

.151 

.025 

.  161 

.170 

.175 

.180 

.184 

.187 

.189 

.193 

.194 

.196 

.198 

.199 

.05 

.220 

.230 

.235 

.240 

.245 

.248 

.250 

.254 

.255 

.257 

.259 

.261 

.10 

.325 

.336 

.342 

.347 

.353 

.356 

.358 

.362 

.364 

.366 

.368 

.370 

.25 

.698 

.712 

.719 

.727 

.734 

.738 

.741 

.747 

.749 

.752 

.754 

,756 

.50 

2.09 

2.12 

2.13 

2.15 

2.16 

2.17 

2,17 

2.18 

2.  IS 

2.19 

2.19 

2.20 

.75 

9.49 

9.58 

9.63 

9.67 

9.71 

9.74 

9.76 

9.78 

9.80 

9.82 

9.84 

9.85 

.90 

61.2 

61.7 

62.0 

62.3 

62.5 

62.7 

62.8 

63.0 

63.1 

63.2 

63.3 

63.3 

.95 

246 

248 

249 

250 

251 

252 

252 

253 

253 

254 

254 

254 

.975 

985 

993 

997 

lOOi 

1011 

1011 

1011 

1011 

1011 

102L 

1021 

102* 

.99 

6161 

62li 

623i 

626i 

6291 

63  0L 

631i 

6331 

6341 

635i 

6361 

6371 

.995 

2462 

2482 

249* 

2502 

2512 

2522 

253= 

2532 

2542 

254* 

2542 

255* 

.999 

6163 

621» 

6233 

6263 

6293 

63  O3 

6313 

6333 

6343 

6353 

637* 

.9995 

246* 

248* 

249* 

250* 

251* 

252* 

252* 

253* 

253* 

253* 

254* 

2544 

2 

.0005 

.076 

.088 

.094 

,101 

.108 

.113 

.116 

.122 

.124 

.127 

.130 

,132 

.001 

.088 

.100 

.107 

.114 

.121 

.126 

.129 

.135 

.137 

.140 

.143 

.145 

,005 

.130 

.143 

.150 

.157 

.165 

.169 

.173 

.179 

.181 

.184 

.187 

.189 

.01 

.157 

.171 

.178 

.186 

.193 

.198 

.201 

.207 

.209 

.212 

.215 

.217 

.025 

.210 

.224 

.232 

.239 

.247 

.251 

.255 

.261 

.263 

.266 

.269 

.271 

.05 

.272 

.286 

.294 

.302 

.309 

.314 

.317 

.324 

.326 

.329 

.332 

.334 

.10 

.371 

.386 

.394 

.402 

.410 

.415 

.418 

.424 

.426 

.429 

.433 

.434 

.25 

.657 

.672 

.680 

.689 

.697 

.702 

.705 

.711 

.713 

.716 

.719 

.721 

.50 

1.38 

1.39 

1.40 

1.41 

1.42 

1.42 

1.43 

1.43 

1.43 

1.44 

1.44 

1.44 

.75 

3.41 

3.43 

3.43 

3.44 

3.45 

3.45 

3.46 

3.47 

3.47 

3.48 

3.48 

3.48 

.90 

9.42 

9.44 

9.45 

9.46 

9.47 

9.47 

9.47 

9.48 

9.48 

9.49 

9.49 

9.49 

.95 

19.4 

19.4 

19.5 

19.5 

19.5 

19.5 

19.5 

19.5 

19.5 

19.5 

19.5 

19.5 

.975 

39.4 

39.4 

39.5 

39.5 

39.5 

39.5 

39.5 

39.5 

39.5 

39.5 

39.5 

39.5 

.99 

99.4 

99.4 

99.5 

99.5 

99.5 

99.5 

99.5 

99.5 

99.5 

99.5 

99.5 

99.5 

.995 

199 

199 

199 

199 

199 

199 

199 

199 

199 

199 

199 

200 

.999 

999 

999 

999 

999 

999 

999 

999 

999 

999 

999 

999 

999 

.9995 

2001 

2001 

2001 

2001 

20Q1 

2001 

2001 

2001 

2001 

2001 

2001 

2001 

3 

.0005 

.093 

.109 

.117 

.127 

.136 

.143 

.147 

.156 

.158 

.162 

.166 

.169 

.001 

.107 

.123 

.132 

.142 

.152 

.158 

.162 

.171 

.173 

.177 

.181 

.184 

.005 

.154 

.172 

.181 

.191 

.201 

.207 

.211 

.220 

.222 

.227 

.231 

.234 

.01 

.185 

.203 

.212 

.222 

.232 

.238 

.242 

.251 

.253 

.258 

.262 

.264 

.025 

.241 

.259 

.269 

.279 

.289 

.295 

.299 

.308 

.310 

.314 

.318 

.321 

.05 

.304 

,323 

.332 

,342 

.352 

.358 

.363 

.370 

.373 

.377 

.382 

.384 

.10 

.402 

.420 

.430 

.439 

.449 

.455 

.459 

.467 

.469 

.474 

.476 

.480 

.25 

.658 

,675 

.684 

.693 

.702 

.708 

.711 

.719 

.721 

.724 

.728 

.730 

.50 

1.21 

1.23 

1.23 

1.24 

1.25 

1.25 

1.25 

1.26 

1.26 

1.26 

1.27 

1,27 

.75 

2.46 

2.46 

2,46 

2.47 

2.47 

2.47 

2.47 

2.47 

2.47 

2.47 

2.47 

2.47 

.90 

5.20 

5.18 

5.18 

5.17 

5.16 

5.15 

5.15 

5.14 

5.14 

5.14 

5.14 

5.13 

.95 

8.70 

8.66 

8.63 

8.62 

8.59 

8.58 

8.57 

8.55 

8.55 

8.54 

8.53 

8.53 

.975 

14.3 

14.2 

14.1 

14.1 

14.0 

14.0 

14.0 

14.0 

13.9 

13.9 

13.9 

13.9 

.99 

26.9 

26.7 

26.6 

26.5 

26.4 

26.4 

26.3 

26.2 

26.2 

26.2 

26.1 

26.1 

.995 

43.1 

42.8 

42.6 

42.5 

42.3 

42.2 

42.1 

42.0 

42,0 

41.9 

41.9 

41.8 

.999 

127 

126 

126 

125 

125 

125 

124 

124 

124 

124 

124 

123 

.9995 

203 

201 

200 

199 

199 

198 

198 

197 

197 

197 

196 

196 
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"2 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

4 

.0005 

.0«44 

.0*50 

.0246 

.013 

.024 

.036 

.047 

.057 

.066 

.075 

.082 

.089 

.001 

.0*18 

.0210 

.0273 

.019 

.032 

.046 

.058 

.069 

.079 

.089 

.097 

.104 

.005 

.CH44 

.0250 

.022 

.043 

.064 

.083 

.100 

.114 

.126 

.137 

.145 

.153 

.01 

.0318 

.010 

.035 

.063 

.088 

.109 

.127 

.143 

.156 

.167 

.176 

.185 

.025 

.0211 

.026 

.066 

.104 

.135 

.161 

.181 

.198 

.212 

.224 

.234 

.243 

05 

.0244 

.052 

.110 

.157 

,193 

.221 

.243 

.261 

.275 

.288 

.298 

.307 

.10 

.018 

.108 

.187 

.243 

,284 

.314 

.338 

.356 

.371 

.384 

.394 

.403 

.25 

.117 

.309 

.418 

.484 

.528 

.560 

.583 

.601 

.615 

.627 

.637 

.645 

.50 

.549 

.828 

.941 

1.00 

1.04 

1.06 

1.08 

1.09 

1.10 

1.11 

1.12 

1.13 

.75 

1.81 

2.00 

2.05 

2.06 

2.07 

2.08 

2.08 

2.08 

2.08 

2.08 

2.08 

2.08 

.90 

4.54 

4.32 

4.19 

4.11 

4.05 

4.01 

3.98 

3.95 

3.94 

3.92 

3.91 

3.90 

.95 

7.71 

6.94 

6.59 

6.39 

6.26 

6.16 

6.09 

6.04 

6.00 

5.96 

5.94 

5.91 

.975 

12.2 

10.6 

9.98 

9.60 

9.36 

9.20 

9.07 

8.98 

8.90 

8.84 

8.79 

8.75 

.99 

21.2 

18.0 

16.7 

16.0 

15.5 

15.2 

15.0 

14.8 

14.7 

14.5 

14.4 

14.4 

.995 

31.3 

26.3 

24.3 

23.2 

22.5 

22.0 

21.6 
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3.14 

3.02 

2.90 
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20 
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60 

100 
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500 

CO 

120 

.0005 

.199 

.256 

.293 

.338 

.390 

.429 

.458 

.524 

.543 

.578 

.614 

.676 

.001 

.223 

.282 

.319 

.363 

.415 

.453 

.480 

.542 

.568 

.595 

.631 

.691 

.005 

.297 

,356 

.393 

.434 

.484 

.520 

.545 

.605 

.623 

.661 

.702 

.733 

.01 

.338 

.397 

.433 

.474 

.522 

.556 

.579 

.636 

.652 

.688 

.725 

.755 

.025 

.406 

.464 

.498 

.536 

.580 

.611 

.633 

.684 

.698 

.729 

.762 

.789 

.05 

.473 

.527 

.559 

.594 

.634 

.661 

.682 

.727 

.740 

.767 

.785 

.819 

.10 

.560 

.609 

.636 

.667 

.702 

.726 

.742 

.781 

.791 

.815 

.838 

.855 

.25 

.730 

.765 

.784 

.805 

.828 

.843 

.853 

.877 

.884 

.897 

.911 

.923 

.50 

.961 

.972 

.978 

.983 

.989 

.992 

.994 

1.00 

1.00 

1.00 

1-01 

1.01 

.75 

1.24 

1.22 

1.21 

1.19 

1.18 

1.17 

1.16 

1.14 

1.13 

1.12 

1,11 

1.10 

.90 

1.55 

1.48 

1.45 

1.41 

1.37 

1.34 

1.32 

1.27 

1.26 

1.24 

1.21 

1.19 

,95 

1.75 

1.66 

1.61 

1.55 

1.50 

1.46 

1.43 

1.37 

1.35 

1.32 

1*.2S 

1.25 

.975 

1.95 

1.82 

1.76 

1.69 

1.61 

1.56 

1.53 

1.45 

1.43 

1.39 

1.34 

1.31 

.99 

2.19 

2.03 

1.95 

1.86 

1.76 

1.70 

1.66 

1.56 

1.53 

1-48 

1.42 

1.38 

.995 

2.37 

2.19 

2.09 

1.98 

1.87 

1.80 

1.75 

1.64 

1.61 

1-54 

1.48 

1.43 

.999 

2.78 

2.53 

2.40 

2.26 

2.11 

2,02 

1.95 

1.82 

1.76 

1-70 

1.62 

1.54 

.9995 

2.96 

2.67 

2.53 

2.38 

2.21 

2.11 

2.01 

1.88 

1.84 

1.75 

1.67 

1.60 

00 

.0005 

.207 

.270 

.311 

,360 

.422 

.469 

.505 

.599 

.624 

.704 

.304 

1.00 

.001 

.232 

.296 

.338 

.386 

.448 

.493 

.527 

.617 

.649 

.719 

.819 

l.OO 

.005 

.307 

.372 

.412 

.460 

.518 

.559 

.592 

.671 

.699 

.762 

.843 

l.OO 

.01 

.349 

.413 

.452 

.499 

,554 

.595 

.625 

.699 

,724 

.782 

.858 

1.00 

.025 

.418 

.480 

.517 

.560 

.611 

.645 

.675 

.741 

.763 

.813 

.878 

l.OO 

.05 

.484 

.543 

.577 

.617 

.663 

.694 

.720 

.781 

.797 

.840 

.896 

1.00 

.10 

.570 

.622 

.652 

.687 

.726 

.752 

.774 

.826 

.838 

.877 

.919 

1.00 

.25 

.736 

.773 

.793 

.816 

.842 

.860 

.872 

.901 

.910 

.932 

.957 

l.OO 

.50 

.956 

.967 

.972 

.978 

.983 

.987 

.989 

.993 

.994 

.997 

.999 

1.00 

.75 

1.22 

1.19 

1.18 

1.16 

1.14 

1.13 

1.12 

1.09 

1.08 

1.07 

1.04 

1.00 

.90 

1.49 

1.42 

1.38 

1.34 

1.30 

1.26 

1.24 

1.18 

1.17 

1.13 

l.OS 

1.00 

.95 

1.67 

1.57 

1.52 

1.46 

1.39 

1.35 

1.32 

1.24 

1.22 

1.17 

1.11 

1.00 

.975 

1.83 

1.71 

1.64 

1.57 

1.48 

1,43 

1.39 

1.30 

1.27 

1.21 

1.13 

1.00 

.99 

2.04 

1.88 

1.79 

1.7O 

1.59 

1.52 

1.47 

1.36 

1.32 

1.25 

1.15 

1.00 

.995 

2.19 

2.00 

1.90 

1.79 

1.67 

1.59 

1.53 

1.40 

1.36 

1.28 

1.17 

1.00 

.999 

2.51 

2.27 

2.13 

1.99 

1.84 

1.73 

1.66 

1.49 

1.45 

1.34 

1.21 

1.00 

.9995 

2.65 

2.37 

2.22 

2.07 

1.91 

1.79 

1.71 

1,53 

1.48 

1.36 

1.22 

1.00 

C5431 


APPENDIX     71 

RANDOM  NUMBERS 
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00 

39591 

66O82 

48626 

95780 

55228 

87189 

75717 

97042 

19696 

48613 

01 

463O4 

97377 

43462 

21739 

14566 

72533 

60171 

29024 

77581 

72760 

02 

99547 

60779 

22734 

23678 

44895 

89767 

18249 

41702 

35850 

40543 

03 

06743 

63537 

24553 

77225 

94743 

79448 

12753 

95986 

78088 

48019 

04 

69568 

65496 

49033 

88577 

98606 

92156 

08846 

54912 

12691 

13170 

05 

68198 

69571 

34349 

73141 

4264O 

44721 

30462 

35075 

33475 

474O7 

06 

27974 

12609 

77428 

64441 

49008 

60489 

66780 

55499 

80842 

57706 

07 

50552 

20688 

02769 

63O37 

15494 

71784 

70559 

58158 

53437 

46216 

08 

74687 

02033 

98290 

62635 

88877 

28599 

63682 

35566 

03271 

05651 

09 

49303 

76629 

71897 

30990 

62923 

36686 

96167 

11492 

90333 

84501 

10 

89734 

39183 

52026 

14997 

15140 

18250 

62831 

51236 

61236 

09179 

11 

74042 

40747 

O2617 

11346 

01884 

82066 

55913 

72422 

13971 

64209 

12 

84706 

31375 

67053 

73367 

95349 

31074 

36908 

42782 

89690 

480O2 

13 

83664 

21365 

28882 

48926 

45435 

60577 

85270 

02777 

06878 

27561 

14 

47813 

74854 

73388 

11385 

99108 

97878 

32858 

17473 

07682 

20166 

15 

O0371 

56525 

38880 

53702 

09517 

47281 

15995 

98350 

25233 

79718 

16 

81182 

48434 

27431 

55806 

25389 

4O774 

72978 

16835 

65066 

28732 

17 

75242 

35904 

73077 

24537 

81354 

48902 

03478 

42867 

04552 

66034 

18 

96239 

80246 

O70OO 

09555 

55051 

49596 

44629 

88225 

28195 

44598 

19 

82988 

1744O 

85311 

03360 

38176 

51462 

86070 

03924 

84413 

92363 

20 

77599 

29143 

89088 

57593 

60036 

17297 

30923 

36224 

46327 

96266 

21 

61433 

33118 

53488 

82981 

44709 

63655 

64388 

00498 

14135 

57514 

22 

76008 

15045 

45440 

84062 

52363 

18079 

33726 

44301 

86246 

99727 

23 

26494 

76598 

85834 

10844 

5630O 

02244 

72118 

96510 

98388 

80161 

24 

46570 

88558 

77533 

33359 

07830 

84752 

53260 

46755 

36881 

98535 

25 

73995 

41532 

87933 

79930 

14310 

64833 

49020 

70067 

99726 

970O7 

26 

93901 

38276 

75544 

19679 

62899 

11365 

22896 

42118 

77165 

08734 

27 

41925 

28215 

40966 

93501 

45446 

27913 

21708 

01788 

81404 

15119 

28 

80720 

02782 

24326 

41328 

10357 

86883 

80086 

77138 

57072 

121OO 

29 

92596 

39416 

5O362 

04423 

04561 

58179 

54188 

44978 

14322 

97056 

30 

39693 

58559 

45839 

47278 

38548 

38885 

19875 

26829 

86711 

57O05 

31 

86923 

37863 

14340 

30929 

04079 

65274 

03030 

15106 

09362 

82972 

32 

9970O 

79237 

18172 

58879 

56221 

65644 

33331 

87502 

32961 

40996 

33 

60248 

21953 

52321 

16984 

03252 

9O433 

97304 

50181 

71026 

01946 

34 

29136 

71987 

03992 

67025 

31070 

78348 

47823 

11033 

13037 

47732 

35 

57471 

42913 

85212 

42319 

92901 

97727 

04775 

94396 

38154 

25238 

36 

57424 

93847 

03269 

56096 

95028 

14039 

76128 

63747 

27301 

65529 

37 

56768 

71694 

63361 

80836 

30841 

71875 

40944 

54827 

01887 

54822 

38 

70400 

81534 

02148 

41441 

26582 

27481 

84262 

14084 

42409 

62950 

39 

05454 

88418 

48646 

99565 

36635 

85496 

18894 

77271 

26894 

O0889 

40 

80934 

56136 

47063 

96311 

19067 

59790 

08752 

68040 

85685 

83076 

41 

O6919 

46237 

50676 

11238 

75637 

43086 

95323 

52867 

06891 

32089 

42 

00152 

23997 

41751 

74756 

50975 

75365 

70158 

67663 

51431 

46375 

43 

88505 

74625 

71783 

82511 

13661 

63178 

39291 

76796 

74736 

10980 

44 

64514 

80967 

33545 

09582 

86329 

58152 

05931 

35961 

70O69 

12142 

45 

25280 

53007 

99651 

96366 

49378 

8O971 

10419 

12981 

70572 

11575 

46 

71292 

63716 

9321O 

59312 

39493 

24252 

54849 

29754 

41497 

79228 

47 

49734 

50498 

O8974 

05904 

68172 

02864 

10994 

22482 

12912 

17920 

48 

43075 

09754 

71880 

92614 

99928 

94424 

86353 

87549 

94499 

11459 

49 

15116 

16643 

O3981 

06566 

14050 

33671 

03814 

48856 

41267 

76252 

1  Reproduced  from  George  W.  Snedecor,  Everyday  Statistics.  Copyright  1950. 
Published  by  Wm.  C.  Brown  Company,  I>ubuque,  Iowa.  !By  permission  of  the 
author  and  publishers. 
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25178 

O  C  1  fi/^. 

41773 

39926 

09843 

29694 

43801 

69276 

44707 

23455 

00 

rrOoVJO 

yj  _L\JO 

ooolo 

oOoOO 

37383 

76832 

37024 

06581 

r\i 

99068 
70983 

30898 
35453 
88359 

42152 
95583 

13923 
12078 
79848 

44987 
04913 
24101 

45122 
06083 
67502 

86515 
06645 
25692 

55836 
93310 
42496 

96165 
40016 

19650 

Ul 

04 

71181 

48289 

03153 

18779 

65702 

03612 

64608 

84071 

47588 

09982 

05 

91555 

87708 

70964 

43346 

27731 
56811 

08725 

75139 

77674 

44816 
82467 

07 

54307 

12188 

58089 

73745 

35569 

97352 

77301 

37684 

6991  R 

HQ 

63631 

23919 

06785 

13891 

89918 

76211 

09362 

34292 

17640 

65907 

Uo 

09 

46832 
05944 
28199 
08391 
29634 

90116 
05600 
13021 

98898 
39310 
59501 
00624 
96568 

49025 
95068 
15124 

97793 
61725 
73005 
33776 
55092 

84954 
44985 
44043 

36775 
81937 
11587 
01505 
31073 

71974 
16820 
97691 
76911 
92371 

15574 
85446 
90415 
45539 
51288 

09184 
51168 
84685 
32181 
33378 

12 
13. 
14 

61509 

18842 

79201 

46451 

68594 

98120 

68110 

91062 

42095 

61839 

15 

87888 

23033 

69837 

65661 

15130 

44649^ 

42515 

83861 

50721 

36110 

16 

94585 

15218 

74838 

61809 

92293 

S5400 

46934 

08531 

70107 

65707 

j.  \j 
17 

82033 

93915 

34898 

79913 

70013 

27573 

39256 

35167 

35070 

47095 

18 

79131 

10022 

82199 

78976 

22702 

37936 

10445 

96846 

84927 

69745 

19 

79344 

39236 

41333 

11473 

15049 

47930 

99029 

97150 

82275 

55149 

20 

15384 

44585 

18773 

89733 

40779 

59664 

83328 

25162 

58758 

17761 

21 

38802 

90957 

32910 

97485 

10358 

88588 

95310 

22252 

19143 

69011 

22 

85874 

18400 

28151 

29541 

63706 

43197 

65726 

94117 

22169 

91806 

23 

26200 

72680 

12364 

46010 

92208 

59103 

60417 

45389 

56122 

85353 

24 

13772 

75282 

81418 

42188 

66529 

47981 

92548 

10079 

68179 

40915 

25 

91876 

07434 

96946 

98382 

97374 

34-4_44. 

17992 

42811 

01579 

48741 

26 

31721 

21713 

83632 

40605 

24227 

53219 

05482 

86768 

53239 

24812 

27 

92570 

53242 

98133 

84706 

78048 

29645 

79336 

66091 

05793 

25922 

28 

02880 

29307 

73734 

66448 

64739 

74645 

29562 

13999 

17492 

49891 

29 

80982 

14684 

31038 

85302 

98349 

57313 

86371 

33938 

10768 

60837 

30 

38000 

43364 

94825 

32413 

46781 

09685 

69058 

56644 

85531 

55173 

31 

14218 

94289 

79484 

61868 

40034 

22546 

68726 

14736 

89844 

13466 

32 

74358 

21940 

40280 

22233 

09123 

49375 

55094 

46113 

54046 

51771 

33 

39049 

14986 

94000 

26649 

13037 

34609 

45186 

89515 

63214 

66886 

34 

48727 

06300 

91486 

67316 

84576 

11100 

37580 

49629 

83224 

46321 

35 

22719 

29784 

40682 

96715 

40745 

57458 

70048 

48306 

50270 

87424 

36 

33980 

36769 

51977 

03689 

79071 

20279 

64787 

48877 

44063 

93733 

37 

23885 

66721 

16542 

12648 

65986 

43104 

45583 

75729 

35118 

58742 

38 

85190 

44068 

78477 

69133 

58983 

96504 

44232 

74809 

25266 

73872 

39 

33453 

36333 

45814 

78128 

55914 

89829 

43251 

41634 

48488 

49153 

40 

98236 

11489 

97240 

01678 

30779 

75214 

80039 

68895 

95271 

19654 

41 

21295 

53563 

43609 

48439 

87427 

88065 

09892 

58524 

43815 

31340 

42 

28335 

79849 

69842 

71669 

38770 

54445 

48736 

03242 

83181 

85403 

43 

95449 

35273 

62581 

85522 

35813 

34475 

97514 

72839 

10387 

31649 

44 

88167 

03878 

89405 

55461 

73248 

48620 

31732 

47317 

06252 

54652 

45 

86131 

62596 

98785 

02360 

54271 

26242 

93735 

20752 

17146 

18315 

46 

71134 

90264 

30126 

08586 

97497 

61678 

81940 

00907 

39096 

02082 

47 

02664 

53438 

76839 

52290 

77999 

05799 

93744 

16634 

84924 

31344 

48 

90664 

96876 

16663 

25608 

67140 

84619 

67167 

13192 

81774 

58619 

49 

[545] 


50 
51 
52 
53 
54 

55 
56 
57 
58 
59 

60 
61 
62 
63 
64 

65 
66 
67 
68 
69 

70 
71 

72 
73 
74 

75 
76 
77 
78 
79 

80 
81 
82 
83 
84 

85 
86 
87 
88 
89 

90 
91 
92 
93 
94 

95 
96 
97 
98 
99 


93873  86558  72524  02542  73184  37905  05882  15596  73646  50798 

08761  47547  02216  48086  56490  89959  69975  04500  23779  76697 

61270  98773  40298  26077  80396  08166  35723  61933  13985  19102 

73758  15578  95748  02967  35122  36539  72822  68241  34803  42457 

17132  32196  60523  00544  73700  70122  27962  85597  36011  79971 

26175  29794  44838  84414  82748  22246  70694  57953  39780  17791 

06004  04516  06210  03536  84451  30767  37928  26986  07396  64611 

34687  73753  36327  73704  61564  99434  90938  03967  97420  19913 

27865  08255  57859  04746  79700  68823  16002  58115  07589  12675 

89423  51114  90820  26786  77404  05795  49036  34686  98767  32284 

99030  80312  69745  87636  10058  84834  89485  08775  19041  61375 

02852  54339  45496  20587  85921  06763  68873  35367  42627  54973 

10850  42788  94737  74549  74296  13053  46816  32141  02533  25648 

38301  18507  33151  69434  80103  02603  61110  89395  67621  67025 

48181  95478  62739  90148  00156  09338  44558  53271  87549  45974 

23098  23720  76508  69083  56584  90423  21634  35999  09234  95116 

25104  82019  21120  06165  44324  77577  15774  44091  69687  67576 

22205  40198  86884  28103  57306  54915  03426  66700  45993  36668 

64975  05064  29617  40622  20330  18518  45312  57921  23188  82361 

58710  75278  47730  26093  16436  38868  76861  85914  14162  21984 

12140  72905  26022  07675  16362  34504  47740  39923  04081  03162 

73226  39840  47958  97249  14146  34543  76162  74158  59739  67447 

12320  86217  66162  70941  58940  58006  80731  66680  02183  94678 

41364  64156  23000  23188  64945  33815  32884  76955  56574  61666 

97881  80867  70117  72041  03554  29087  19767  71838  80545  61402 

88295  87271  82812  97588  09960  06312  03050  77332  25977  18385 

95321  89836  78230  46037  72483  87533  74571  88859  26908  55626 

24337  14264  30185  36753  22343  81737  62926  76494  93536  75502 

00718  66303  75009  91431  64245  61863  16738  23127  89435  45109 

38093  10328  96998  91386  34967  40407  48380  09115  59367  49596 

87661  31701  29974  56777  66751  35181  63887  95094  20056  84990 

87142  91818  51857  85061  17890  39057  44506  00969  32942  54794 

60634  27142  21199  50437  04685  70252  91453  75952  66753  50664 

73356  64431  05068  56334  34487  78253  67684  69916  63885  88491 

29889  11378  65915  66776  95034  81447  98035  16815  68432  63020 

48257  36438  48479  72173  31418  14035  84239  02032  40409  11715 

38425  29462  79880  45713  90049  01136  72426  25077  64361  94284 

48226  31868  38620  12135  28346  17552  03293  42618  44151  78438 

80189  30031  15435  76730  58565  29817  36775  64007  47912  16754 

33208  33475  95219  29832  74569  50667  90569  66717  46958  04820 

19750  48564  49690  43352  53884  80125  47795  99701  06800  22794 

62820  23174  71124  36040  34873  95650  79059  23894  58534  78296 

95737  34362  81520  79481  26442  37826  76866  01580  83713  94272 

64642  62961  37566  41064  69372  84369  92823  91391  61056  44495 

77636  60163  14915  50744  95611  99346  39741  04407  72940  87936 

43633  52102  93561  31010  11299  52661  79014  17910  88492  60753 

93686  41960  61280  96529  52924  87371  34855  67125  40279  10186 

23775  33402  28647  42314  51213  29116  26243  40243  32137  25177 

91325  64698  58868  63107  08993  96000  66854  11567  80604  72299 

58129  44367  31924  73586  24422  92799  28963  36444  01315  10226 


[5461 


ro  31209  83677  99115  94024  09286  58927  24078  16770 

58108  29344  11825  51955  50618  99753  02200  50503  32466  5OO55 

71545  42326  66429  93607  55276  85482  24449  41764  19884  46443 

93303  90557  79166  90097  01627  96690  77434  06402  05379  59549 

36731  37929  13079  83036  31525  35811  59131  65257  03731  §670? 


«nnc   nH£o   !229i   846°8   2339°   30433   °8240   85136   ^0060   43651 


«nnc   no 

65995  94208  68785  04370  44192  91852  01129  28739  08705 

09309  02836  10223  90814  92786  96747  46014  54765  76001 

63812  47615  1722°  27942  11785  4"33  03923  35432 


95407  95006  95421  20811  76761  47475  58865  06204  36543  81O02 

22789  87011  61926  97996  10604  80855  48714  52754  98279  96467 

96783  18403  36729  18760  30810  73087  94565  68682  15792  60020 

68933  05665  12264  23954  01583  75411  04460  83939  66528  22576 

68794  13000  20066  98963  93483  51165  63358  12373  13877  37580 

40537  31604  60323  51235  65546  85117  15647  09617  73520  48525 

41249  42504  91773  81579  02882  74657  73765  10932  74607  83825 

08813  84525  30329  33144  76884  89996  07834  67266  96820  15128 

46609  30917  29996  10848  39555  09233  58988  82131  69232  76762 

68543  69424  92072  57937  05563  80727  67053  35431  OO881  56541 

09926  84219  30089  08843  24998  27105  18397  79071  40738  73876 

30515  76316  49597  37900  98604  05857  51729  19006  15239  27129 

21611  26346  04877  71584  55724  39616  64648  36811  60915  34108 

47410  83767  56454  96768  27001  83712  01245  27256  57991  75758 

18572  31214  41015  64110  61807  72472  78059  69701  78681  17356 

28078  02819  02459  33308  96540  15817  78694  81476  87856  99737 

56644  50430  34562  75842  67724  02918  55603  55195  88219  39676 

27331  48055  18928  47763  61966  64507  06559  81329  29481  03660 

32080  21524  32929  07739  00836  39497  94476  27433  96857  52987 

27027  69762  65362  90214  89572  52054  43067  73017  87664  03293 

56471  68839  09969  45853  72627  71793  49920  64544  71874  74053 

22689  19799  18870  49272  74783  38777  76176  40961  18089  32499 

71263  82247  66684  90239  67686  48963  30842  59354  33551  87966 

64084  57386  89278  27187  52142  96305  87393  80164  95518  82742 

23121  10194  09911  37062  43446  09107  47156  70179  00858  92326 

78906  48080  76745  65814  51167  87755  66884  12718  14951  47937 

87257  26005  21544  37223  53288  72056  96396  67099  49416  91891 

39529  98126  33694  29025  94308  24426  63072  51444  04718  49891 

89632  11606  87159  89408  06295  31055  15530  46432  49871  37982 

23708  98919  14407  53722  58779  92849  04176  24870  56688  25405 

51445  46758  42024  27940  64237  10086  95601  53923  85209  79385 

23849  65272  24743  39960  27313  99925  29743  87270  05773  21797 

78613  15441  34568  57398  25872  61792  94599  60944  90908  38948 

90694  27996  94181  87428  41135  29461  72716  68956  67871  72459 

96772  86829  36403  40087  67456  21071  39039  91937  45280  00066 

24527  40701  56894  73327  00789  97573  09303  41704  05772  95372 

31596  70876  46807  06741  29352  23829  52465  00336  24155  61871 

31613  99249  1726O  05242  19535  52702  64761  66694  06150  13820 

02911  09514  50864  80622  20017  59019  43450  75942  08567  40547 

02484  74068  04671  19646  41951  05111  34013  57443  87481  48994 

69259  75535  73007  15236  01572  44870  53280  25132  70276  87334 
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54 
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60 
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71 
72 
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APPENDIX  81 

CONTROL  CHART  CONSTANTS 

1  Reprinted  from  ASTM  Manual  on  Quality  Control  of  Materials,  Special 
Technical  Publication  15-C,  Table  B2,  American  Society  for  Testing  Materials, 
Philadelphia,  1951.  By  permission  of  the  authors  and  publishers. 
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APPENDIX    91 

NUMBER  OF  OBSERVATIONS  FOR 
f-TEST  OF  MEAN2 

1  Reproduced  from  Table  E  of  Owen  L.  Davies,   The  Design  and  Analysis  of 
Industrial   Experiments,    second   ed.,    Oliver   and    Boyd,    Edinburgh,    1956.    By 
permission  of  the  author  and  publishers. 

2  The  entries  in  this  table  show  the  number  of  observations  needed  in  a  i-test 
of  the  significance  of  a  mean  in  order  to  control  the  probabilities  of  errors  of  the 
first  and  second  kinds  at  a  and  j8,  respectively. 
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Level  of  f-Test 

0.01 

0.02 

0.05 

0.1 

Single-Sided  Test 

«=0.005 

«=0.01 

a-0  025 

Double-Sided  Teat 

a-0.01 

«=0.02 

a^O.05 

a=0.05 
«=0.1 

*~ 

0.01  0,05  0.1     0.2     0.5 

0.010.05    0.1     0.2    0.5 
i  

0.01  0.05  0.1    0.2     0.5 

0.01  0.05  0.1    0.2    0.5 

0.05 

i 

0.10 

0.05 

0.15 

0.10 

122 

0.15 

0.20 

139 

99 

70 

0.20 

0.25 

110 

90 

128      64 

139     101      45 

0.25 

0,30 

134      78 

115      63 

119      90      45 

122      97      71      32 

0.30 

0.35 

125       99      58 

109      85      47 

109      88      67      34 

90      72      52      24 

0.35 

0.40 

115    *97      77      45 

101       85      66      37 

117      84      68      51      26 

101      70      55      40      19 

0.40 

0,45 

92     ;77      62      37 

110      81      68      53      30 

93      67      54      41      21 

80      55      44      33      15 

0.45 

0.50 

100      75      63      51      30 

90      66      55      43      25 

76      54      44      34      18 

65      45      36      27      13 

0.50 

0.55 

83      63      53      42      26 

75      55      46      36      21 

63      45      37      28      15 

54      38      30      22      11 

0.55 

0.60 

71      53      45      36      22 

63      47      39      31      18 

53      38      32      24      13 

46      32      26      19        9 

0.60 

0.65 

61      46      39      31      20 

55      41      34      27      16 

46      33      27      21      12 

39      28      22      17        8 

0.65 

0.70 

53      40      34      28      17 

47      35      30      24      14 

40      29      24      19      10 

34      24      19       15        8 

0.70 

0.75 

47      36      30      25      16 

42      31      27      21      13 

35      26      21       16        9 

30      21      17      13        7 

0.75 

0.80 

41      32      27      22      14 

37      28      24      19      12 

31      22      19      15        9 

27      19      15      12        6 

0.80 

0.85 

37      29      24      20      13 

33      25      21      17      11 

28      21      17      13        8 

24      17      14      11        6 

0.85 

Value  of 

^.90 

34      26      22      18      12 

29      23      19      16      10 

25      19      16      12        7 

21       15      13       10        5 

0.90 

s 

0.95 

31      24      20      17      11 

27      21      18      14        9 

23      17      14      11        7 

19      14      11        9        5 

0.95 

*>=? 

1.00 

28      22      19      16      10 

25      19      16      13        9 

21      16      13      10        6 

18      13      11        85 

1.00 

1.1 

24      19      16      14        9 

21      16      14      12        8 

18      13      11        9        6 

15      11        9        7 

1.1 

1.2 

21      16      14      12        8 

18      14      12      10        7 

15      12      10        8        5 

13      10        8        6 

1.2 

1,3 

18      15      13      11        8 

16      13      11        9        6 

14      10        9        7 

11        8        7        6 

1.3 

1.4 

16      13      12      10        7 

14      11      10        9        6 

12        9        8        7 

10        8        7        5 

1.4 

1.5 

15      12      11        9        7 

13      10        9        8        6 

11        8        7        6 

976 

1.5 

1.6 

13      11       10        8        6 

12      10        9        7        5 

10        8        7        6 

866 

1.6 

1.7 

12      10        9        8        6 

11        9        8        7 

9765 

865 

1.7 

1.8 

12      10        9        86 

10        8        7        7 

876 

7        6 

1.8 

1.9 

11        9        8        7        6 

10        8        7        6 

866 

7        5 

1.9 

2.0 

10        8        8        7        5 

9776 

765 

6 

2.0 

2.1 

10        8        7        7 

8766 

7        6 

6 

2.1 

2.2 

9876 

8765 

7        6 

6 

2.2 

2.3 

9776 

866 

6        5 

5 

2.3 

2.4 

8776 

766 

6 

2.4 

2.5 

8766 

766 

6 

2.5 

3.0 

7665 

655 

5 

3.0 

3.5 

655 

5 

3.5 

4.0 

6 

4.0 
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APPENDIX    1  Ol 

NUMBER  OF  OBSERVATIONS  FOR 
f-TEST  OF  DIFFERENCE  BETWEEN 
TWO  MEANS2 

1  Reproduced  from  Table  E.I  of  Owen  L.  Davies,  The  Design  and  Analysis  of 
Industrial  Experiments,  second  ed'.,  Oliver  and  Boyd,  Edinburgh,  1956.  By  per 
mission  of  the  author  and  publishers. 

2  The  entries  in  this  table  show  the  number  of  observations  needed  in  a  Z-test 
of  the  significance  of  the  difference  between  two  means  in  order  to  control  the 
probabilities  of  the  errors  of  the  first  and  second  kinds  at  a.  and  ft  respectively. 
"It  should  be  noted  that  the  entries  in  the  table  show  the  number  of  observa 
tions  needed  in  each  of  two  samples  of  equal  size.77 
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Level  of  f-Test 

0.01 

0.02 

0.05 

0.1 

Single-Sided  Test 

<x=0.005 

a^O.Ol 

<*=*0.025 

a=0.05 

Double-Sided  Test 

a=0.01 

a=0.02 

a=0.05 

a=0.1 

0- 

0.01  0.05   0.1    0.2    0.5 

.01  0.05    0.1    0.2    0.5 

.01  0.05   0.1    0.2    0.5 

0.01  0.05   0.1    0.2    0.5 

0.05 

0.05 

0.10 

0.10 

0.15 

0.15 

0.20 

137 

0.20 

0.25 

124 

88 

0.25 

0.30 

123 

87 

61 

0.30 

0.35 

110 

90 

64 

102      45 

0.35 

0.40 

85 

70 

100      50 

108      78      35 

0.40 

0.45 

118      68 

101      55 

105      79      39 

108      86      62      28 

0.45 

0.50 

96      55 

106      82      45 

106      86      64      32 

88      70      51      23 

0.50 

0.55 

101      79      46 

106      88      68      38 

87      71      53      27 

12      73      58      42      19 

0.55 

0.60 

101      85      67      39 

90      74      58      32 

04      74      60      45      23 

89      61      49      36      16 

0.60 

0.65 

87      73      57      34 

04      77      64      49      27 

88      63      51      39      20 

76      52      42      30      14 

0.&5 

0.70 

100      75      63      SO      29 

90      66      55      43      24 

76      55      44      34      17 

66      45      36      26      12 

0.70 

0.75 

88      66      55      44      26 

79      58      48      38      21 

67      48      39      29      15 

57      40      32      23      11 

0.75 

0.80 

77      58      49      39      23 

70      51      43      33      19 

59      42      34      26      14 

50      35      28      21      10 

0.80 

0.85 

69      51      43      35      21 

62      46      38      30      17 

52      37      31      23      12 

45      31      25      18       9 

0.85 

Vaiue  of 

0.90 

62      46      39      31      19 

55      41      34      27      15 

47      34      27      21      11 

40      28      22      16        8 

0.90 

0.95 

55      42      35      28      17 

50      37      31      24      14 

42      30      25      19      10 

36      25      20      15        7 

0.95 

B--5- 

1.00 

50      38      32      26      15 

45      33      28      22      13 

38      27      23      17        9 

33      23      18      14        7 

1.00 

or 

1.1 

42      32      27      22      13 

38      28      23      19      11 

32      23      19      14        8 

27      19      15      12        6 

1.1 

1.2 

36      27      23      18      11 

32      24      20      16        9 

27      20      16      12        7 

23      16      13      10        5 

1.2 

1.3 

31      23      20      16      10 

28      21      17      14        8 

23      17      14      11        6 

20      14      11        9        5 

1.3 

1.4 

27      20      17      14        9 

24      18      15      12        8 

20      15      12      10        6 

17      12      10        8        4 

1.4 

1.5 

24      18      15      13        8 

21      16      14      11        7 

18      13      11        9        5 

15      11        9        7       4 

1.5 

1.6 

21       16      14      11        7 

19      14      12      10        6 

16      12      10        8        5 

14      10        8        6        4 

1.5 

1.7 

19      15      13      10        7 

17      13      11        9        6 

14      11        9        7        4 

12        9        7        6        3 

1.7 

1.8 

17      13      11      10        6 

15      12      10        8        5 

13      10        8        6        4 

11        8        7        5 

1.8 

1,9 
2.0 

16      12      11        9        6 
14      11      10        8        6 

14      11        9        8        5 
13      10        9        7        5 

12        9        7        6        4 
11        8        7        6        4 

10        7        6        5 
9764 

1.9 
2.0 

2.1 
2.2 
2.3 
2.4 
2.5 

13      10        9        8        5 
12      10        8        7        5 
11        9        8        7        5 
11        9        8        6        5 
10        8        7        6        4 

12        9        8        7        5 
11        9        7        6        4 
10        8        7        6        4 
10        8        7        6        4 
97654 

10        8        6        5        3 
9765 
9765 
8654 
8654 

8654 
8654 
7554 
7544 
6543 

2.1 
2.2 
2.3 
2.4 
2.5 

3.0 
3.5 

86654 
65543 

76543 
6544 

6544 
5443 

543 

4        3 

3.0 
3.5 
4.0 

4.0 

6544 

5443 

443 

4 
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APPENDIX    1 


NUMBER  OF  OBSERVATIONS  REQUIRED 
FOR  THE  COMPARISON  OF  A 
POPULATION  VARIANCE  WITH  A 
STANDARD  VALUE  USING  THE     -TEST 


V 

0  =  0.01 

0  =  0.05 

).01 

0=0.5 

0  =  0.01 

0  =  0.05 

.05 
0  =  0.1 

0  =  0.5 

1 

42,240 

1,687 

420.2 

14.58 

24,450 

977.0 

243.3 

8.444 

2 

458.2 

89.78 

43.71 

6.644 

298.1 

58.40 

28.43 

4.322 

3 

98.79 

32.24 

19.41 

4.795 

68.05 

22.21 

13.37 

3.303 

4 

44,69 

18.68 

12.48 

3.955 

31.93 

13,35 

8.920 

2.826 

5 

27.22 

13.17 

9.369 

3.467 

19.97 

9.665 

6.875 

2.544 

6 

19.28 

10.28 

7.628 

3.144 

14.44 

7.699 

5.713 

2.354 

7 

14.91 

8.524 

6.521 

2.911 

11.35 

6,491 

4.965 

2.217 

8 

12.20 

7.352 

5.757 

2.736 

9.418 

5,675 

4.444 

2.112 

9 

10.38 

6.516 

5.198 

2.597 

8.103 

5.088 

4.059 

2.028 

10 

9.072 

5.890 

4.770 

2.484 

7.156 

4.646 

3.763 

1.960 

12 

7.343 

5.017 

4.159 

2.312 

5,889 

4.023 

3.335 

1.854 

15 

5.847 

4.211 

3.578 

2.132 

4.780 

3.442 

2.925 

1.743 

20 

4.548 

3.462 

3.019 

1.943 

3.802 

2.895 

2.524 

1.624 

24 

3,959 

3.104 

2.745 

1.842 

3.354 

2.630 

2.326 

1.560 

30 

3.403 

2.752 

2.471 

1.735 

2.927 

2.367 

2.125 

1.492 

40 

2.874 

2.403 

2.192 

1.619 

2.516 

2.103 

1.919 

1.418 

60 

2.358 

2.046 

1.902 

1.490 

2.110 

1.831 

1.702 

1.333 

120 

1.829 

1.661 

1.580 

1.332 

1.686 

1.532 

1.457 

1.228 

CO 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1  Reproduced  from  Table  G  of  Owen  L.  Davies,  The  Design  and  Analysis  of 
Industrial    Experiments,    second   ed.,    Oliver   and    Boyd,    Edinburgh,    1956.    By 
permission  of  the  author  and  publishers. 

2  The  entries  in  this  table  show  the  value  of  the  ratio  R  of  the  population 
variance  <r2  to  a  standard  variance  &l  which  will  be  undetected  with  probability 
0  in  a  x2-test  at  the  100<*  per  cent  significance  level  of  an  estimate  s2  of  cr2  based 
on  v  degrees  of  freedom. 
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NUMBER  OF  OBSERVATIONS  REQUIRED 
FOR  THE  COMPARISON  OF  TWO 
POPULATION  VARIANCES  USING 
THE  F-TEST2 


V 

0=0.01 

0=0.05 

L 
0=0.1 

0=0.5 

0-0.01 

a=0,05 
0=0.05   0=0.1 

0=0.5 

0=0.01 

«=0.5 

0=0.05  0=0.1 

0=0.5 

1 

16,420,000 

654,200 

161,500 

4,052 

654,200 

26,070       6,436 

161.5 

4,052 

161.5 

39.85 

1.000 

2 

9,801 

1,881 

891.0 

99.00 

1,881 

361.0       171.0 

19.00 

99.00 

19.00 

9.000 

1.000 

3 

867.7 

273.3 

158.8 

29.46 

273.3 

86.06       50.01 

9.277 

29.46 

9.277 

5.391 

1.000 

4 

255.3 

102.1 

65.62 

15.98 

102.1 

40.81       26.24 

6.388 

15.98 

6.388 

4.108 

1.000 

5 

120.3 

55.39 

37.87 

10.97 

55.39 

25,51       17.44 

5.050 

10.97 

5.050 

3.453 

1.000 

6 

71.67 

36.27 

25.86 

8.466 

36.27 

18.35       13.09 

4.284 

8.466 

4.284 

3.056 

1.000 

7 

48.90 

26.48 

19.47 

6.993 

26.48 

14.34       10.55 

3,787 

6.993 

3.787 

2.786 

1.000 

8 

36.35 

20.73 

15.61 

6.029 

20.73 

11.82       8.902 

3.438 

6.029 

3.438 

2.589 

1.000 

9 

28.63 

17.01 

13.06 

5.351 

17.01 

10.11       7,757 

3.179 

5.351 

3.179 

2.440 

1.000 

10 

23.51 

14.44 

11.26 

4.849 

14.44 

8.870       6.917 

2.978 

4.849 

2.978 

2.323 

1.000 

12 

17.27 

11.16 

8.923 

4.155 

11.16 

7.218      5.769 

2.687 

4.155 

2.687 

2.147 

1.000 

15 

12.41 

8.466 

6.946 

3.522 

8.466 

5,777      4.740 

2.404 

3.522 

2.404 

1.972 

1.000 

20 

8.630 

6.240 

5.270 

2.938 

6.240 

4.512       3.810 

2.124 

2.938 

2.124 

1,794 

1.000 

24 

7.071 

5.275 

4.526 

2.659 

5,275 

3.935      3.376 

1.984 

2.659 

1.984 

1.702 

1.000 

30 

5.693 

4.392 

3,833 

2.386 

4.392 

3.389       2.957 

1.841 

2.386 

1.841 

1.606 

1.000 

40 

4.470 

3.579 

3.183 

2.114 

3.579 

2.866       2.549 

1.693 

2.114 

1.693 

1.506 

1.000 

60 

3.372 

2.817 

2.562 

1.836 

2.817 

2.354      2.141 

1.534 

1.836 

1.534 

1.396 

1.000 

120 

2.350 

2.072 

1.939 

1.533 

2.072 

1.828       1.710 

1.352 

1.533 

1.352 

1.265 

1.000 

00 

1,000 

1.000 

1.000 

1,000 

1.000 

1.000       1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1  Reproduced  from  Table  H  of  Owen  L.  Davies,  The  Design  and  Analysis  of 
Industrial  Experiments,  second  ed.,  Oliver  and  Boyd,  Edinburgh,  1956.  By  per 
mission  of  the  author  and  publishers. 

2  The  entries  in  this  table  show  the  value  of  the  ratio  R  of  two  population 
variances  vl/<r\  which  will  be  undetected  with  probability  /3  in  a  variance  ratio 
test  at  the  lOOce  per  cent  significance  level  of  the  ratio  si/ 'si  of  estimates  of  the 
two  variances,  both  being  based  on  v  degrees  of  freedom. 
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CRITICAL  VALUES  OF  r  FOR  THE 
SIGN  TEST2 


n 

1% 

5% 

10% 

25% 

n 

1% 

5% 

10% 

25% 

1 

46 

13 

15 

16 

18 

2 

47 

14 

16 

17 

19 

3 

O 

48 

14 

16 

17 

19 

4 

0 

49 

15 

17 

18 

19 

5 

0 

0 

50 

15 

17 

18 

20 

6 

0 

0 

1 

51 

15 

18 

19 

20 

7 

O 

0 

1 

52 

16 

18 

19 

21 

8 

0 

0 

1 

1 

53 

16 

18 

20 

21 

9 

0 

1 

1 

2 

54 

17 

19 

20 

22 

10 

o 

1 

1 

2 

55 

17 

19 

2O 

22 

11 

0 

1 

2 

3 

56 

17 

20 

21 

23 

12 

1 

2 

2 

3 

57 

18 

20 

21 

23 

13 

1 

2 

3 

3 

58 

18 

21 

22 

24 

14 

1 

2 

3 

4 

59 

19 

21 

22 

24 

15 

2 

3 

3 

4 

60 

19 

21 

23 

25 

16 

2 

3 

4 

5 

61 

20 

22 

23 

25 

17 

2 

4 

4 

5 

62 

20 

22 

24 

25 

18 

3 

4 

5 

6 

63 

20 

23 

24 

26 

19 

3 

4 

5 

6 

64 

21 

23 

24 

26 

2O 

3 

5 

5 

6 

65 

21 

24 

25 

27 

21 

4 

5 

6 

7 

66 

22 

24 

25 

27 

22 

4 

5 

6 

7 

67 

22 

25 

26 

28 

23 

4 

6 

7 

8 

68 

22 

25 

26 

28 

24 

5 

6 

7 

8 

69 

23 

25 

27 

29 

25 

5 

7 

7 

9 

70 

23 

26 

27 

29 

26 

6 

7 

8 

9 

71 

24 

26 

28 

30 

27 

6 

7 

8 

10 

72 

24 

27 

28 

30 

28 

6 

8 

9 

10 

73 

25 

27 

28 

31 

29 

7 

8 

9 

10 

74 

25 

28 

29 

31 

30 

7 

9 

10 

11 

75 

25 

28 

29 

32 

31 

7 

9 

10 

11 

76 

26 

28 

30 

32 

32 

8 

9 

10 

12 

77 

26 

29 

30 

32 

33 

8 

10 

11 

12 

78 

27 

29 

31 

33 

34 

9 

10 

11 

13 

79 

27 

30 

31 

33 

35 

9 

11 

12 

13 

80 

28 

30 

32 

34 

36 

9 

11 

12 

14 

81 

28 

31 

32 

34 

37 

10 

12 

13 

14 

82 

28 

31 

33 

35 

38 

10 

12 

13 

14 

83 

29 

32 

33 

35 

39 

11 

12 

13 

15 

84 

29 

32 

33 

36 

40 

11 

13 

14 

15 

85 

30 

32 

34 

36 

41 

11 

13 

14 

16 

86 

30 

33 

34 

37 

42 

12 

14 

15 

16 

87 

31 

33 

35 

37 

43 

12 

14 

15 

17 

88 

31 

34 

35 

38 

44 

13 

15 

16 

17 

89 

31 

34 

36 

38 

45 

13 

15 

16 

18 

90 

32 

35 

36 

39 

* 

*  For  values  of  n  larger  than  90,  approximate  values  of  r  may  be  found  by  taking  the 
nearest  integer  less  than  (n— 1)/2— -k^/n-\-l,  where  k  is  1.2879,  0.980O,  0.8224,  0.5752 
for  the  1,  5,  10,  25%  values,  respectively. 

1  Reproduced  from  W.  J.  Dixon  and  F.  J.  Massey,  Jr.,  An  Introduction  to 
Statistical  Analysis,  McGraw-Hill  Book  Company,  Inc.,  New  York,  1951,  p.  324. 
By  permission  of  the  authors  and  publishers. 
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TABLE  OF  CRITICAL  VALUES  OF  T  IN 
THE  WILCOXON  SIGNED  RANK  TEST2 


_evel  of  Significance  for 

One-Tailed  Test 

.025 

.01 

.005 

L,evel  of  Significance  for 

Two-Tailed  Test 

n 

.05 

.02 

.01 

6 

0 



— 

7 

2 

0 

— 

8 

4 

2 

0 

9 

6 

3 

2 

10 

8 

5 

3 

11 

11 

7 

5 

12 

14 

10 

7 

13 

17 

13 

10 

14 

21 

16 

13 

15 

25 

20 

16 

16 

30 

24 

20 

17 

35 

28 

23 

18 

40 

33 

28 

19 

46 

38 

32 

20 

52 

43 

38 

21 

59 

49 

43 

22 

66 

56 

49 

23 

73 

62 

55 

24 

81 

69 

61 

25 

89 

77 

68 

* 

*  For  n>25,  T  is  approximately  normally  distributed  with  mean  «(w+l)/4  and  vari 
ance  «(«+!)  (2n  +  l)/24. 

Adapted  from  Table  I  of  F.  Wilcoxon,  Some  Rapid  Approximate  Statistical 
,  American  Cyanamid  Co.,  Stanford,  Conn.,  1949,  p.  13.  By  permuwion 


TueSofen  in  the  table  are  critical  values  associated  -ith  selected 
values  of  n.  Any  value  of  T  which  is  less  than  or  equal  to  the  tabulated  value 
is  significant  at  the  indicated  level  of  significance. 
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TABLE  OF  CRITICAL  VALUES  OF  r  IN 
THE  RUN  TEST2 


TABLE  1 


2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

3 

2 

2 

2 

2 

2 

2 

2 

2 

2 

3 

3 

3 

3 

3 

3 

4 

2 

2 

2 

3 

3 

3 

3 

3 

3 

3 

3 

4 

4 

4 

4 

4 

5 

2 

2 

3 

3 

3 

3 

3 

4 

4 

4 

4 

4 

4 

4 

5 

5 

5 

6 

2 

2 

3 

3 

3 

3 

4 

4 

4 

4 

5 

5 

5 

5 

5 

5 

6 

6 

7 

2 

2 

3 

3 

3 

4 

4 

5 

5 

5 

5 

5 

6 

6 

6 

6 

6 

6 

8 

2 

3 

3 

3 

4 

4 

5 

5 

5 

6 

6 

6 

6 

6 

7 

7 

7 

7 

9 

2 

3 

3 

4 

4 

5 

5 

5 

6 

6 

6 

7 

7 

7 

7 

8 

8 

8 

10 

2 

3 

3 

4 

5 

5 

5 

6 

6 

7 

7 

7 

7 

8 

8 

8 

8 

9 

11 

2 

3 

4 

4 

5 

5 

6 

6 

7 

7 

7 

8 

8 

8 

9 

9 

9 

9 

12 

2 

2 

3 

4 

4 

5 

6 

6 

7 

7 

7 

8 

8 

8 

9 

9 

9 

10 

10 

13 

2 

2 

3 

4 

5 

5 

6 

6 

7 

7 

8 

8 

9 

9 

9 

10 

10 

10 

10 

14 

2 

2 

3 

4 

5 

5 

6 

7 

7 

8 

8 

9 

9 

9 

10 

10 

10 

11 

11 

15 

2 

3 

3 

4 

5 

6 

6 

7 

7 

8 

8 

9 

9 

10 

10 

11 

11 

11 

12 

16 

2 

3 

4 

4 

5 

6 

6 

7 

8 

8 

9 

9 

10 

10 

11 

11 

11 

12 

12 

17 

2 

3 

4 

4 

5 

6 

7 

7 

8 

9 

9 

10 

10 

11 

11 

11 

12 

12 

13 

18 

2 

3 

4 

5 

5 

6 

7 

8 

8 

9 

9 

10 

10 

11 

11 

12 

12 

13 

13 

19 

2 

3 

4 

5 

6 

6 

7 

8 

8 

9 

10 

10 

11 

11 

12 

12 

13 

13 

13 

20 

2 

3 

4 

5 

6 

6 

7 

8 

9 

9 

10 

10 

11 

12 

12 

13 

13 

13 

14 

1  Adapted  from  Frieda  S.  Swed  and  C.  Eisenhart,  "Tables  for  testing  random 
ness  of  grouping  in  a  sequence  of  alternatives/7  Ann.  Math.  Stat.,  VoL  14,  1943, 
pp.  83-86.  By  permission  of  the  authors  and  publishers. 

2  The  values  of  r  given  in  Tables  1  and  2  are  various  critical  values  of  r  associ 
ated  with  selected  values  of  wi  and  n2.  For  the  one-sample  run  test,  any  value 
of  r  which  is  equal  to  or  less  than  the  value  shown  in  Table  1  or  equal  to  or 
greater  than  the  value  shown  in  Table  2  is  significant  at  the  5  per  cent  level. 
J*!or  the  two-sample  run  test,  any  value  of  r  which  is  equal  to  or  less  than  the 
value  shown  in  Table  1  is  significant  at  the  5  per  cent  level. 
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TABLE  2 


23456 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

2 

3 

4 

9   9 

5 

9  10  10 

11 

11 

6 

9  10  11 

12 

12 

13 

13 

13 

13 

7 

11  12 

13 

13 

14 

14 

14 

14 

15 

15 

15 

8 

11  12 

13 

14 

14 

15 

15 

16 

16 

16 

16 

17 

17 

17 

17 

17 

9 

13 

14 

14 

15 

16 

16 

16 

17 

17 

18 

18 

18 

18 

18 

18 

10 

13 

14 

15 

16 

16 

17 

17 

18 

18 

18 

19 

19 

19 

20 

20 

11 

13 

14 

15 

16 

17 

17 

18 

19 

19 

19 

20 

20 

20 

21 

21 

12 

13 

14 

16 

16 

17 

18 

19 

19 

20 

20 

21 

21 

21 

22 

22 

13 

15 

16 

17 

18 

19 

19 

20 

20 

21 

21 

22 

22 

23 

23 

14 

15 

16 

17 

18 

19 

20 

20 

21 

22 

22 

23 

23 

23 

24 

15 

15 

16 

18 

18 

19 

20 

21 

22 

22 

23 

23 

24 

24 

25 

16 

17 

18 

19 

20 

21 

21 

22 

23 

23 

24 

25 

25 

25 

17 

17 

18 

19 

20 

21 

22 

23 

23 

24 

25 

25 

26 

26 

18 

17 

18 

19 

20 

21 

22 

23 

24 

25 

25 

26 

26 

27 

19 

17 

18 

20 

21 

22 

23 

23 

24 

25 

26 

26 

27 

27 

20 

17 

18 

20 

21 

22 

23 

24 

25 

25 

26 

27 

27 

28 
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TABLE  OF  CRITICAL  VALUES  OF  D  IN 
THE  KOLMOGOROV-SMIRNOV 
GOODNESS  OF  FIT  TEST2 


Sample  Size 
(n) 

Level  of  Significance 

for  £>= 

Maximum  |  F(x)  —Sn(x 

01 

.20 

.15 

.10 

.05 

.01 

1 

.900 

.925 

.950 

.975 

.995 

2 

.684 

.726 

.776 

.842 

.929 

3 

.565 

.597 

.642 

.708 

.828 

4 

.494 

.525 

.564 

.624 

.733 

5 

.446 

.474 

.510 

.565 

.669 

6 

.410 

.436 

.470 

.521 

.618 

7 

.381 

.405 

.438 

.486 

.577 

8 

.358 

.381 

.411 

.457 

.543 

9 

.339 

.360 

.388 

.432 

,514 

10 

.322 

.342 

.368 

.410 

.490 

11 

.307 

.326 

.352 

.391 

.468 

12 

.295 

.313 

.338 

.375 

.450 

13 

.284 

.302 

.325 

.361 

.433 

14 

.274 

.292 

.314 

.349 

.418 

15 

.266 

.283 

.304 

.338 

.404 

16 

.258 

.274 

.295 

.328 

.392 

17 

.250 

.266 

.286 

.318 

.381 

18 

.244 

.259 

.278 

.309 

.371 

19 

.237 

.252 

.272 

.301 

.363 

20 

.231 

.246 

.264 

.294 

.356 

25 

.21 

.22 

.24 

.27 

.32 

30 

.19 

.20 

.22 

.24 

.29 

35 

.18 

.19 

.21 

.23 

.27 

Over  35 

1.07 

1.14 

1.22 

1.36 

1.63 

V^                 VZ                VZ                 Vi                v^ 

1  Adapted  from  F.  J.  Massey,  Jr.,  "The  Kolmogorov-Smirnov  test  for  goodness 
of  fit/'  Jour.  Amer.  Stat.  Assn.,  Vol.  46,  1951,  pp.  68-78.  By  permission  of  the 
author  and  publisher. 

2  The  values  of  D  given  in  the  table  are  critical  values  associated  with  selected 
values  of  n.  Any  value  of  D  which  is  greater  than  or  equal  to  the  tabulated  value 
is  significant  at  the  indicated  level  of  significance. 
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APPENDIX    1?i 

PERCENTAGE  POINTS  OF  PSEUDO  f 
AND  f  STATISTICS 

TABLE  1-Table  for  Testing  the  Significance  of  the  Deviation  of  the  Mean 
of  a  Small  Sample  (of  Size  «)  From  Some  Pre-Assigned  Value 


n 

P 

0.95 

0.975 

0.99 

0,995 

0.999 

0.9995 

2 

3.196 

6.353 

15.910 

31.828 

159.16 

318-31 

3 

0.885  — 

1.304 

2.111 

3.008 

6.77 

9.58 

4 

.529 

0.717 

1.023 

1.316 

2.29 

2.85+ 

5 

.388 

.507 

0.685+ 

0.843 

1.32 

1,58 

6 

0.312 

0.399 

0.523 

0.628 

0.92 

1.07 

7 

.263 

.333 

.429 

.507 

.71 

0.82 

8 

.230 

.288 

.366 

.429 

.59 

.67 

9 

.205  — 

.255+ 

.322 

.374 

,50 

.57 

10 

.186 

.230 

.288 

.333 

.44 

.50 

11 

0.170 

0.210 

0.262 

0.302 

0.40 

0.44 

12 

.158 

.194 

.241 

.277 

.36 

.40 

13 

.147 

.181 

.224 

.256 

.33 

,37 

14 

.138 

.170 

.209 

.239 

.31 

.34 

15 

.131 

.160 

.197 

.224 

.29 

,32 

16 

0.124 

0.151 

0.186 

0.212 

0.27 

0.30 

17 

.118 

.144 

.177 

.201 

.26 

-28 

18 

.113 

.137 

.168 

.191 

.24 

.26 

19 

,108 

.131 

.161 

.182 

.23 

.25+ 

20 

.104 

.126 

.154 

.175- 

.22 

.24 

P  {TJ  <  value  in  table}  =  p 

1  Table  1  is  reproduced  with  the  permission  of  Professor  E.  S.  Pearson  from 
E.  Lord,  "The  Use  of  the  Range  in  Place  of  the  Standard  Deviation  in  the  t 
Test/'  Biometrika,  XXXIV  (1947),  66. 

Table  2  is  reproduced  with  the  permission  of  Professor  E.  S.  Pearson  from 
E.  Lord,  "The  Use  of  the  Range  in  Place  of  the  Standard  Deviation  in  the  t 
Test,7'  Biometrika,  XXXIV  (1947),  66. 

Table  3  is  reproduced  from  J.  E.  Walsh,  "On  the  Range-Midrange  Test  and 
Some  Tests  With  Bounded  Significance  Levels/'  Ann.  Math.  Stat.,  XX  (1949), 
257.  By  permission  of  the  author  and  publishers. 

Table  4  is  reproduced  from  R.  F.  Link,  "On  the  Ratio  of  Two  Ranges/7  Ann. 
Math.  Stat.,  XXI  (1950),  112.  By  permission  of  the  author  and  publishers. 
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TABLE  2-Table  for  Testing  the  Significance  of  the  Difference  Between  the 
Means  of  Two  Small  Samples  of  Equal  Size  n 


n 

P 

0.95 

0.975 

0.99 

0.995 

0.999 

0.9995 

2 

1.161 

1.713 

2.776 

3.958 

8.90 

12.62 

3 

.487 

.636 

.857 

1.046 

1.63 

2.09 

4 

.322 

.406 

.523 

.618 

.87 

.99 

5 

.246 

.306 

.386 

.448 

.60 

.67 

6 

.202 

.249 

.310 

.357 

.47 

.51 

7 

.173 

.213 

.262 

.300 

.38 

.42 

8 

.153 

.186 

.229 

.260 

.33 

.36 

9 

.137 

.167 

.204 

.232 

.29 

.32 

10 

.125 

.152 

.185 

.209 

.26 

.29 

11 

.116 

.140 

.170 

.192 

.24 

.26 

12 

.107 

.130 

.157 

.177 

.22 

.24 

13 

.100 

.121 

.147 

.165 

.20 

.22 

14 

.094 

.114 

.138 

.155 

.19 

.21 

15 

.089 

.108 

.130 

.146 

.18 

.19 

16 

.085 

.102 

.123 

.139 

.17 

.18 

17 

.081 

.097 

.118 

.132 

.16 

.17 

18 

.077 

.093 

.112 

.126 

.15 

.17 

19 

.074 

.089 

,108 

.121 

.15 

.16 

20 

.071 

.086 

.103 

.116 

.14 

.15 

<  value  in  table}  = 
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TABLE   3-Cumulative  Percentage  Points  for 


Sample  Size 

P      •                     j>      .99             p  —  .995 

2 

3.16 

6.35 

15.91 

31.83 

3 

.90 

1.30 

2.11 

3.  02 

4 

.55 

.74 

1.04 

1.37 

5 

.42 

.52 

.71 

.85 

6 

.35 

.43 

.56 

.66 

7 

.30 

.37 

.47 

.55 

8 

.26 

.33 

.42 

.47 

9 

.24 

.30 

.38 

.42 

10 

.22 

.27 

.35 

.39 

p  —  P  {7-3  <  value  In  the  table} 
TABLE  4-Substitute  F-Ratio,  Cumulative  Percentage  Points 


Sample 

Sample  Size  for  Numerator 

Size  for 

Denominator 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

12.7 

19.1 

25 

28 

29 

31 

32 

34 

36 

3 

3.19 

4.4 

5.0 

5.7 

6.2 

6.6 

6.9 

7.2 

7.4 

4 

2,02 

2.7 

3.1 

3.4 

3.6 

3.8 

4.0 

4.2 

4,4 

5 

1.61 

2.1 

2,4 

2.6 

2.8 

2.9 

3.0 

3.1 

3,2 

6 

1.36 

1.8 

2,0 

2.2 

2.3 

2.4 

2.5 

2.6 

2.7 

7 

1.26 

1.6 

1.8 

1.9 

2.0 

2.1 

2.2 

2.3 

2.4 

8 

1.17 

1.4 

1.6 

1.8 

1.9 

1.9 

2.0 

2.1 

2,1 

9 

1.10 

1.3 

1.5 

1.6 

1.7 

1,8 

1.9 

1.9 

2.0 

10 

1.05 

1.3 

1.4 

1.5 

1.6 

1.7 

1.8 

1.8 

1.9 

p=.975 

Sample 

Sample  Size  for  Numerator 

Denominator 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

25.5 

38.2 

52 

57 

60 

62 

64 

67 

68 

3 

4.61 

6.3 

7,3 

8.0 

8.7 

9.3 

9.8 

10.2 

10.5 

4 

2.72 

3.5 

4.0 

4.4 

4.7 

5.0 

5.2 

5.4 

5.6 

5 

2.01 

2.6 

2.9 

3.2 

3.4 

3.6 

3.7 

3.8 

3.9 

6 

1.67 

2.1 

2.4 

2.6 

2,8 

2.9 

3.0 

3.1 

3.2 

7 

1.48 

1.9 

2.1 

2.3 

2.4 

2.5 

2.6 

2.7 

2.8 

8 

1.36 

1.7 

1.9 

2.0 

2.2 

2.3 

2.3 

2.4 

2.5 

9 

1.27 

1.6 

1.8 

1.9 

2.0 

2.1 

2.1 

2.2 

2.3 

10 

1.21 

1.5 

1.6 

1.8 

1.9 

1.9 

2.0 

2.0 

2.1 

[5631 


=  .99 


Sample 

Sample  Size  for  Numerator 

Size  for 
Denominator 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

63.7 

95 

125 

140          150 

150 

160 

160 

160 

3 

7.37 

10 

12 

13 

14 

15 

15 

16 

17 

4 

3.83 

5.0 

5.5 

6.0 

6.4 

6.7 

7.0 

7.2 

7.5 

5 

2.64 

3.4 

3.8 

4.1 

4.3 

4.6 

4.7 

4.9 

5.0 

6 

2.16 

2.7 

3.0 

3.2 

3.4 

3.6 

3.7 

3.8 

3.9 

7 

1,87 

2.3 

2.6 

2.8 

2.9 

3.0 

3.1 

3.2 

3.3 

8 

1.69 

2.1 

2.3 

2,4 

2.6 

2.7 

2.8 

2.8 

2.9 

9 

1.56 

1.9 

2.1 

2.2 

2.3 

2.4 

2.5 

2.6 

2.6 

10 

1.47 

1.8 

1.9 

2,1 

2.2 

2.2 

2.3 

2,4 

2.4 

p=.995 

Sample 

Sample  Size  for  Numerator 

Size  for 

Denominator 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

127 

191 

230 

250 

260 

270 

280 

290 

290 

3 

10.4 

14 

17 

18 

20 

21 

22 

23 

25 

4 

4.85 

6.1 

7.0 

7.6 

8.1 

8.5 

8.8 

9.3 

9.6 

5 

3.36 

4.1 

4.6 

4.9 

5.2 

5.5 

5.7 

5.9 

6.1 

6 

2.67 

3.1 

3.5 

3.8 

4.0 

4.1 

4.3 

4.5 

4.6 

7 

2.28 

2.7 

2.9 

3.1 

3.3 

3.5 

3.6 

3.7 

3.8 

8 

2.03 

2.3 

2.6 

2.7 

2.9 

3.0 

3.1 

3.2 

3.3 

9 

1.87 

2.1 

2.4 

2.5 

2.6 

2.7 

2.8 

2.9 

3.0 

10 

1.75 

2.0 

2.2 

2.3 

2.4 

2.5 

2.6 

2.6 

2.7 

P{Ri/R*< value  in  the  table} 
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INDEX 


A,  alternative  hypothesis,  108 

a,  probability  of  type  I  error,  107 

Abbreviated  Doolittle  method,  177 

Absolute  value,  18 

Acceptable  quality  level  (AQL),  486 

Acceptance 

number,  485 

region,  108 

sampling,  477,  485 
Accuracy,  104 
Ackoff,  R.  L.,  15 
Additive  law  of  probability,  3 1 
Additive  property 

chi-square,  82 

normal,  80 

Additivity,  test  for,  338 
Adjusted  value  of 

chi-square,  117 

Yf  in  regression,  199 
Alger,  P.  L.,  8,  9,  14 
Alternative  hypothesis  (A),  108 
American     Management    Association, 

14,  15 

American    Society  for    Testing    Mate 
rials,  498,  548 
American  Standards  Association,  Inc., 

498 
Analysis  of  covariance,  200,  437 

adjusted  means,  442,  446,  450,  453 

assumptions,  438 

completely  randomized  design,  439 

factorial,  452 

Latin  square  design,  449 

multiple,  457 

randomized  complete  block  design, 
444 

uses  of,  437 

variance   of    adjusted    means,    442, 

446,  450,  453 
Analysis    of    enumeration    data    (see 

Enumeration  data) 
Analysis  of  regression,  159 
Analysis  of  variance 

among  means,  133 

among  and  within  groups,  279 

assumptions,  281 

between  and  within  groups,  279 

comparing  individual  means,  303 

completely  randomized  design,  278 


component  of  variance   model,  318, 

324 

computational  procedure,  279 
degrees  of  freedom,  281 
disproportionate  subclass  numbers, 

423 

efficiency,  298 
factorials,  316 

fixed  effects  model,  282,  318,  324 
Graeco-Latin  square  design,  410,  414 
group  comparison,  279 
homogeneity  of  variances,   test  for, 

136 

hypotheses,  284 
Incomplete  block  design,  417 
individual  comparisons,  303 
individual  degrees  of  freedom,  303 
in  regression,  166 
Latin  square  design,  412 
mixed  model  319,  325 
Model  I,  fixed  treatment  effects,  282, 

318,  324 
Model  II,  random  treatment  effects, 

282,  318,  324 

Model  III,  mixed  model,  319,  325 
nonconformity    to    assumed    model, 

338 

N-way  classification,  335 
one-way  classification,  279 
proportionate  subclass  numbers,  421 
random  effects  model,  282,  318,  324 
randomized  complete  block  design, 

363 

in  regression,  166 

relation  to  regression  analysis,  340 
relative  efficiency,  300,  373 
relative  information,  300 
selected  treatment  comparisons,  303, 

376 

split-plot  design,  415 
subsampli;ng,  288,  335,  368 
table,  280 

three-way  classification,  321 
two-way  classification,  316 
transformations,  340 
Andersen,  S.  L.,  338,  361 
Anderson,  H.  E.,  434 
Anderson,  J.  A.,  15 
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INDEX 


Anderson,  R.  L,.,  43,  105,  157,  194,  221, 

313,  360,  408,  434,  465 
ANOVA  (AnalyMs  of  variance),  278 
Anscombe,  F.  JL,  275,  338,  360,  434 
AOQ,  average  outgoing  quality,  487- 
AOQL,  average  outgoing  quality  limit, 

488 

AOV  (Analysis  of  variance),  278 
Approximations 

binomial  to  hypergeo metric,  74 
normal  to  binomial,  76,  116 
Poisson  to  binomial,  75 
AQL,,  acceptable  quality  level,  486 
Arbitrary  origin,  54 
Arcsine  transformation,  340 
Arithmetic  mean,  53 
Arnofi%  E>  L.,  15 
Array,  47 

ASN,  average  sample  number,  141,  489 
Association,  measures  of,  222 
Asterisk 

*Significant  at  <x=«0.05,  343 
**Significant  at  a  =  0.01,  343 
Attribute  data  (see  Enumeration  data) 
Attributes 

acceptance  sampling  by,  485 
double  sampling,  485,  488 
multiple  sampling,  485,  490 
sampling  inspection  by,  485 
sequential  sampling,  485,  490 
single  sampling,  485 
Average    (see    Mean,    Median,    Mode, 

etc.),  47 

Average  outgoing  quality  (AOQ),  487 
Average        outgoing        quality        limit 

(AOQL,),  488 

Average  sample  number    (ASN),    141, 
489 


bf  number  of  blocks,  364 

6,  sample  regression  coefficient,  164 

£,  population  regression  coefficient,  164 

£,  probability  of  type  II  error,  107 

Balancing,  251,  268 

Bancroft,  T.  A.,  43,  105,  157,  408,  434, 
465 

Barbacki,  S,,  275 

Bard,  J.  C.,  9,  15 

Bartlett,  M.  S.,  157,  338,  340,  361,  457, 
465 

Bartlett's  test  for  equality  (homogene 
ity)  of  variances,  136,  338 

Bayes3  theorem,  32 

Bazovsky,  I.,  498,  510 

Beatty,  G.  H.,  100,  106 

Bechhofer,  R.  EL,  31O,  361 

Bernhard,  F.  L.,  10,  15 


Bernoulli  trials,  32 
Beta 

distribution,  39 

function,  22 
Beveridge,  W.  L  B.,  15 
Bias,  104 

Bicking,  C.  A.,  265,  266,  271,  275,  434 
Bimodal  distribution,  59 
Bingham,  R.  S.,  Jr.,  275 
Binomial 

approximation    to     hyper  geometric, 
74 

chi-square  approximate  test,  117 

confidence  intervals,  94 

distribution,  38,  74 

estimation,  94 

expansion,  21 

expected  value,  38 

mean,  38,  74 

normal  approximation,  76,  116 

Poisson  approximation,  75 

population,  74 

probabilities,  32 

probability  function,  38,  74 

standard  deviation,  38,  74 

tests  of  hypotheses,  115,  128 

variance,  38,  74 
Birnbaum,  Z.  W.,  474 
Biserial  correlation,  231 
Bivariate  normal  population,  198,  225 
Blocking,  251,  268,  270 
Blocks 

incomplete,  417 

randomized  complete,  268,  363 
Blum,  J.  R.,  475 

Bowker,  A.  H.,  100,  105,  113,  114,  115, 
121,   123,   124,   157,  221,  243,  498, 
504,  510 
Box,  G.  E.  P.,  277,  338,  361,  434,  435, 

436,  502,  510 

Breipohl,  A.  M.,  503,  504,  510 
Bross,  I.  D.  J.,  275 
Brown,  G.  W.,  473,  475 
Brownlee,  K.  A.,  43,  105,  157,  221,  243, 

275,  361,  408,  435,  475 
Brunk,  H.  D.,  475 
Bryan,  J.  G.,  43,  106,  158 
Budne,  T.  A.,  275,  435 
Bur  man,  J".  P.,  436 
Buros,  O.  K.,  15 
Burr,  I.  W.,  75,  86,  498 
Bush,  G.  P.,  15 


C,  coefficient  of  contingency,  232 
c-chart 

center  line,  480 

control  limits,  480 
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example  of,  483 
Calabro,  S.  R.,  510 
Calvert,  R.  L.,  94,  105 
Caplan,  F.,  276 
Carroll,  M.  B.,  435 
Causes,  chance,  477 
Center  line  (for  control  charts),  478 
Central 

limit  theorem,  72 
moments  (definition),  36 
tendency  (measures  of),  53 
Chance  variable,  30 
Chapanis,  A.  R.  E.,  15 
Chapin,  F.  S.,  276,  435 
Chew,  V.,  221,  276,  341,  361,  408,  415, 

435 
Chi-square 

addition  theorem,  82 

adjusted  value,  117 

definition,  81 

degrees  of  freedom,  81 

density  function,  40,  82 

distribution,  37,  40,  81 

expected  value,  40 

heterogeneity,  128 

interaction,  129 

mean,  40 

parameter  of,  81 

pooled,  128 

reproductive  property,  82,  128 

small  expected  numbers,  127 

t^able,  523 

tests, 

Bartlett's  test,  136 

binomial,  117 

contingency  tables,  129 

for  independence,  129 

goodness  of  fit,  126,  466 

homogeneity  of  variances,  136 

multinomial,  124 

Poisson  data,  125 

rXc  table,  129 

variance    of    normal    population, 

114 

total,  128 
variance,  40 

Chorafas,  E>.  N.,  498,  510 
Churchman,  C.  W.,  15 
Clark,  C.  R,,  486,  488,  498 
Class 

frequency,  49 
interval,  47 

end  points,  47 
midpoint,  47 
Classification 
AT- way,  335 
one-way,  279 
three-way,  321 


two-way,  316 
Clopper,  C.  J.,  94,  105 
Cochran,   W.    G,,  261,   276,   303,   338, 

361,  374,  408,  435,  457,  465,  475 
Coding,  54,  62,  239 
Coefficient 

contingency,  232 

correlation,  222 

regression,  164 
Coefficient  of 

alienation,  225 

biserial  correlation,  231 

concordance,  233 

contingency,  232 

correlation,  222 

determination,  225 

linear  correlation,  224 

multiple  correlation,  228 

non-determination,  225 

partial  correlation,  228 

partial  regression,  229 

product-moment  correlation,  225 

tetrachoric  correlation,  232 

variation  (CF),  64 
Coefficients 

for  orthogonal  polynomials,  194,  314 

for  selected  treatment  comparisons, 

262 

Cohen,  M.  R.,  15 
Colcord,  C,  C.,  529 
Combinations 

of  n  things  taken  r  at  a  time,  20 

treatment,  252 

Combining  experimental  results,  128 
Comparison,  261 
Comparison  of 

means,  262,  310 

percentages,  131 

proportions,  131 

standard  deviations,  123 

variances,  123 
Comparisons 

among  means,  262,  306,  310 

designed,  262,  306 

group 

equal  size,  280 
unequal  size,  279 

individual,  262,  306,  310 

orthogonal,  263,  306,  376 

selected,  262,  306 
Complement,  17 
Completely  randomized  design,  268 

analysis  of  co variance,  439 

analysis  of  variance,  280 

assumptions,  281 

calculations,  279 

comparison   of  selected  treatments, 
306 
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INDEX 


Completely  randomized  design  (cont'd) 
components  of  variance,  285,  298 
definition  of,  278 
degrees  of  freedom,  281 
examples  of,  278 
expected  mean  squares,  282,  298 
factorials,  316 

individual  degrees  of  freedom,  304 
relative  efficiency,  300 
standard  error  of  a  treatment  mean, 

285 

subsampling,  288,  335 
*-test,  288 
table,  280 

tests  of  hypotheses,  284,  327 
variance  of  a  treatment  mean,  285 
Component  of  variance 
definition  of,  237 
estimate  of,  299 

Component  of  variance  model,  282 
Co  nuponents  of  variance,  299 
Composite  design,  424 
Concomitant 

information,  159 
variable,  159 

Concordance,  coefficient  of,  233 
Conditional 

distribution,  35 
probability,  31 
Confidence 
coefficient,  90 

interval,  definition  of,  88,  89 
interval  statement,  91 
limits 

definition,  90 
one-sided,  90 
two-sided,  90 
limits  for 

binomial  parameter,  94 

contrast,  309 

correlation  coefficient,  226 

difference  between  two  means,  95 

fraction,  94 

intercept  of  straight  line,  171 

mean,  univariate,  90,  92 

mean,  in  regression,  171 

percentage,  94 

proportion,  94 

ratio  of  two  standard  deviations, 

97 

ratio  of  two  variances,  97 
regression  coefficient,  170 
slope  of  straight  line,  170 
standard  deviation,  93 
variance,  93 
Confounded,  246 
Confounding,  248 
Connor,  W.  S.,  277,  435 


Consistent  estimator,  87 
Consumer's  risk,  487 
Contingency 

coefficient  of,  232 
tables,  129 

2X2,  or  fourfold,  131,  132 
rXc,  129 
JV-way,  131 
Continuity,     correction    for,    77,     116, 

131 
Continuous 

distribution,  33 
distribution  function,  33 
probability  density  function,  33 
random  variable,  30 
variable,  30 
Contrast,  261,  306,  376 
Control 

lack  of,  478 
local,  246,  250 
quality,  477 
state  of,  478 
statistical,  478 
Control  charts 
center  line,  480 
definition  of,  477 
factors  for  computing  limits,  549 
interpretation  of,  479 
limits,  definition  of,  480 
types 

c-chart    (for    number   of    defects), 

479,  483 
2>-chart     (for    fraction    defective), 

479,  481 

JE-chart  (for  ranges  or  variability), 
_      479,  481 

X-chart  (for  averages),  479,  481 
Coons,  Irma,  465 

Correction  for  continuity,  77,  116,  131 
Correlated  observations,  96,   121,   159, 

222 

Correlation 
analysis,  222 

between  mean  and  variance,  340 
biserial,  231 
coefficient,  222 
definition,  36 
homotypic,  235 
index,  223 
intraclass,  235 
methods,  159,  222 
multiple,  227 
partial,  228 
product-moment,  36 
rank,  233 
ratio,  229 

sample  coefficient,  224 
several  samples,  227 
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simple,  223 
spurious,  236 
sums  and  differences,  238 
tetrachoric,  232 
Coutie,  G.  A.,  434 
Co  variance 

analysis  of,  200,  437 
definition  of,  36 

Cox,  D.  R.,  261,  273,  274,  276,  435,  465 
Cox,  G.  M.,  15,  261,  276,  361,  408,  435, 

457,  465 

Cramer,  H.,  28,  30,  43 
Crampton,  E.  W.,  457,  458,  465 
Critical  region,  109 
Crow,  E.  L.,  498 
Crump,  S,  L.,  276 
Cumulative  distribution,  33,  52 
Cumulative  distribution  function 
continuous,  33 
definition  of,  33 
discrete,  33 
properties,  33 
Cumulative  frequency 
distribution,  49,  52 
table,  52 

Cumulative  relative  frequency,  52 
Curve  fitting,  160 
Curvilinear  regression,  190 
CV,  coefficient  of  variation,  64 


D,  difference  between  paired  observa 
tions,  96,  121  m 
D,   statistic  in  the    Kolmogorov-Smir- 

nov  test,  471 

table  of  critical  values,  560 
Daniel,  C.,  435 
Data,  missing,  390,  412 
David,  H.  A.,  338,  361 
Davies,  G.  R.,  210 

Davies,  O.  L.,  157,  167,  221,  261,  276, 
361,  408,  418,  419,  425,  435,  550, 
552,  554,  555 
Davis,  F.  A.,  498 
DeBaun,  R.  M.,  435 
Decile  limits,  57 
Decisions 

correct,  108 
incorrect,  108 
Defect,  483 
Defective 
item,  481 

proportion,  94,  115,  481 
Defects,  control  chart  for  (see  c-chart), 

483 
d.f.,    degrees   of  freedom,   61,   81,   281, 

329 


Degrees  of  freedom  (d.f.),  61,  81,  281, 

329 

DeLury,  D.  B.,  276 
Deming,  L.  S.,  529 
Density  function,  33 
chi-square,  40 
continuous  probability,  33 
exponential,  39 
F,  40 

gamma,  39 
normal,  39 

probability  (p.d.f.),  33 
relation  to  cumulative  function,  33 
standard  normal,  39 
t,  40 

Weibull,  39 

Dependent  variable,  159 
Derman,  C.,  43 
Design 

completely  randomized,  268,  278 
composite,  424 
experimental,  244 
advantages,  271 

check  list  of  pertinent  points,  265 
disadvantages,  271 
examples  of  approach  to,  266 
nature  and  value  of,  244 
principles  of,  246 
steps  in,  264 
of  experiments,  244 
Graeco-Latin  square,  410 
incomplete  block,  417 
Latin  square,  410 
random  balance,  425 
randomized     complete     block,     268, 

363 

split  plot,  271,  415 
Descriptive  statistics,  44 
Deter  minant 

calculations,  24 
definition,  23 
Deviation 

from  mean,  61 
from  regression,  162 
standard,  36 

Dichotomous  data  (see  Binomial) 
Dick,  I.  D.,  339,  362 
Difference  between 

paired  observations  (Z>),  96,  121 
two  means,  81,  95,  119,  288 
two  proportions,  132 
Differences  u 

among  correlation  coefficients,  226 
among  means,  133,  279 
between  means,  81,  95,  119,  288 
among  variances,  136 
Differential  response,  258 
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Digits 

random,  46 

table  of,  544 
Discrete 

distribution  function,  33 

probability  function,  33 

random  variable,  30 
Dispersion,  measures  of,  60 
Disproportionate     subclass     numbers. 

423 
Distribution 

beta,  39 

binomial,  38,  74 

bivariate,  34 

bivariate  normal,  198,  225 

chi-square,  37,  40,  81 

conditional,  35 

continuous,  33 

cumulative,  33 

difference  between  two  means,  81 

discrete,  33 

empirical,  49 

expected  value  of,  35 

exponential,  39 

F,  37,  40,  84 

frequency,  47 

function,  33 

gamma,  39 

Gaussian,  39 

geometric,  38,  79 

hypergeometric,  38,  73 

joint,  34 

of    linear    combinations    of    random 
variables,  80 

marginal,  34 

of  mean,  80 

multinomial,  78 

negative  binomial,  38,  79 

normal,  39 

Poisson,  37,  38 

probability,  33 

of  s2  and  s,  83 

sampling,  70 

of  standard  deviation,  83 

standard  normal,  37,  39 

"Student's"  t,  37,  40,  83 

of  sum  of  squares,  82 

t,  37,  40,  83 

of  variance,  83 

Weibull,  39 
Distribution-free 

methods,  466 

tests,  466 

tolerance  limits,  100 
Dixon,  W.  J.,   1O5,   157,  221,  243,  338, 

361,  408,  475,  529,  556 
Dodge,  H.  F.,  498 
Doolittle  solution,  abbreviated,  177 


Double  sampling,  485,  488 

advantages  of,  489 

nature  of,  488 
Duffett,  J.  R.,  276 
Dummer,  G.  W.  A.,  498,  510 
Dummy  treatments,  374 
Duncan,  A.  J.,  498 
Duncan,  D.  B.,  310,  361 
Dunnett,  C.  W.,  310,  361 
Dykstra,  O.,  Jr.,  435 


E*,  correlation  ratio,  229 

EVOP,  evolutionary  operation,  501 

c,  error  variable,  161 

Economy,  resource,  245 

Effect,  definition  of,  257 

Effects 

fixed,  282 
interaction,  257 
main,  257 
random,  282 
Efficiency 

relative,  300,  373 
statistical,  245 
Efficiency  of 

Latin  square,  414 
randomized  complete  block,  375 
range,  63 
Eisenhart,  C.,  100,  106,  158,  221,  338, 

361,  476,  499,  558 
Element,  17 

Empirical  distribution,  49 
English,  T.  S.,  147 
Enumeration  data 

acceptance  sampling  plans,  485 
binomial 

correction  for  continuity,  77,  116, 

131 

estimation,  94 
normal  approximation,  76 
Poisson  approximation,  75 
tests  of  hypotheses 
exact,  116 

normal  approximation,  116 
chi-square  approximation,   117 
control  charts  (c  and  p),  479 
double  dichotomy  (2X2  table) 
correction  for  continuity,  131 
independence,    approximate    test, 

131 

independence,  exact  test,  132 
goodness  of  fit  test,  126 
hypergeometric,  73 

binomial  approximation,  74 
independence  in  two-way  tables 
2X2,  131 
rXc,  129 
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multinomial,  78,  124 

chi-square  approximation,  124 
percentages  or  proportions,  74,  115 
comparison  of  two,  131 
comparison  of  k,  128 
proportions,  74,  115 

confidence  intervals,  94 
tests,  115,  128,  131 
Poisson,  75,  124 

tests,  124 
Equality 

of  means,  119,  133 
of  variances,  123,  136 
Equally  likely  events,  29 
Equations 

estimating,  163 
linear,  24,  164 
normal,  164 
predicting,  163 
regression,  163 

simultaneous,  24,  162,  164,  178 
Error 

off  estimate,  170 
experimental,  246,  247 
partitioning  of,  340,  376 
reduction  of,  248 
in  independent  variable,  198 
kind 

Type  I,  107 
Type  II,  107 
sampling,  291 
standard 

of  contrast,  309 
of  estimate,  170 
of  mean,  91,  285,  366 
of  regression  coefficient,  170 
Estimate 
biased,  87 
interval,  88 
minimum  variance,  87 
point,  87 
unbiased,  87 
Estimates 
best,  87 

maximum  likelihood,  89 
repeatability  of,  104 
Estimating  equation,  163 
Estimation,  87 

confidence  interval 

for    difference    between    means    of 

two  normal  populations,  95 
for  mean  of  a  normal  population, 

90 

for  a  proportion,  94 
for     slope     and     intercept     of     a 

straight  l£ne,  170 

for  standard  deviation  of  a.  normal 
population,  93 


for  variance  of  a  normal  popula 
tion,  93 
statement,  91 

point 

maximum  likelihood,  88 
method  of  least  squares,  88 
method  of  moments,  88 
minimum  chi-square,  88 
Estimator 

consistent,  87 

definition,  61 

interval,  88 

minimum  variance,  87 

point,  87 

properties,  87 

unbiased,  61,  70,  87 
Estimators 

maximum  likelihood,  89 

methods  of  obtaining,  88 
Event,  29 

Evolutionary  operation  (EVOP),  501 
EVOP  (evolutionary  operation),  501 
Expectation 

mathematical,  33 

of   mean   squares    in    ANOVA,    282, 

328 

Expected  frequency,  127 
Expected  mean  squares,  282,  328 
Expected  numbers,   117,  124,  127,  130 
Expected  value 

definition,  33,  35 

notation,  33,  35 
Expected  values  of  (or  for) 

beta  distribution,  39 

binomial  distribution,  38,  74 

chi-square  distribution,  40 

constant,  36 

constant  times   a  random  variable, 
36 

continuous  random  variable,  36 

discrete  random  variable,  36 

exponential  distribution,  39 

F- distribution,  40 

gamma  distribution,  39 

geometric  distribution,  38 

hypergeo metric  distribution,  38,  74 

linear  combinations  of  random  vari 
ables,  80 

negative  binomial  distribution,  38 

normal  distribution,  39 

Poisson  distribution,  38 

^-distribution,  40 
Weibull  distribution,  39 
Experiment 

definition  of,  30 

design  of,  244 
Experimental  design 

nature  of,  4,  244 
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Experimental  Design  (continued) 

principles  of,  246 

purpose  of,  245 

steps  in,  264 
Experimental 

error,  246,  247 

units,  247 
Experimentation,  4 
Exponential 

distribution,  39 

regression,  190,  194 
Extrapolation,  173 
Extreme  values,  84 
Ezekiel,  M.,  196,  221 


F-ratio,  less  than  unity,  301 
/'"-variate 

definition  of,  83 

degrees  of  freedom,  83 

distribution,  37,  40,  84 

expected  value,  40 

parameters  of,  83 

table  of,  529 

/,  observed  frequency,  5O 
Factor 

definition,  254 

level,  254 

symbolism,  254 
Factorial 

design  (see  Factorials) 

experiment  (see  Factorials) 

notation  (nl),  20 
Factorials 

advantages,  272 

analysis  of  covariance,  452 

calculations,  317,  322 

completely  randomized  design,  316 

component  of  variance  model,   318, 
324 

confounding,  420 

definition  of,  254 

degrees  of  freedom,  329 

disadvantages,  272 

effects,  257 

expected  mean  squares,  327 

fixed  effects  model,  318,  324 

fractional,  419 

interaction 

definition  of,  257 
effects,  257 

main  effects,  257 

meaning  of,  254 

mixed  model,  319,  325 

Model 

I  (fixed),  318,  324 

II  (random),  318,  324 

III  (mixed),  319,  325 


notation,  255 

response  curves,  336 

standard  error  of  effects,  309 

subsampling,  335 

terminology,  254 

tests  of  hypotheses,  327 

variance  of  effects,  309 

without  replication,  418 
Failure  rate,  509 
Fattu,  1ST.  A.,  475 
Federer,  W.  T.,  261,  276,  310,  361,  408, 

435,  465 

Feigenbaum,  A.  V.,  498,  510 
Festinger,  L,.,  475 

Fiducial  limits  (see  Confidence  limits) 
Finite  population  correction  factor,  71 
Finney,  D.  J.,  261,  276,  435,  465 
First  kind  of  error,  107 
Fisher,  R.   A.,  88,   105,   226,   243,  275, 

276,  435,  528 

Fisher's  exact  test  in  2X2  table,  132 
Fitting 

of  constants,  179 

of  curves,  160 
Flagle,  C.  D.,  15 
Fractile,  36 

Fractional  factorials,  419 
Fraction  defective 

control  chart  for,  481 

estimation  of,  94 

tests  of  hypotheses  about,  115 
Fraser,  D.  A.  S.,  475 
Freedman,  P.,  15 

Freedom,  degrees  of,  61,  81,  281,  329 
Freeman,  H.  A.,  499 
Frequency 

class,  47 

cumulative,  49 

distribution,  47 

histogram,  49 

observed  (/),  50 

polygon,  49 

relative,  48 

relative  cumulative,  52 

table,  50 

tally,  47 

Freund,  J.  E.,  105,  157 
Friedman,  M.,  499 
Friedman,  L.,  16 
Fry,  Thornton,  C.,  127,  472 
Function 

beta,  22 

cumulative  distribution,,  33 

density,  33 

distribution,  33 

gamma,  21 

likelihood,  73,  89 

OC,  108,  485 


operating  characteristic,  108    485 
power,   108 

probability  density,  33 
regression,  159 
response,  160 
Functional  relations,  159 

O 

Oamma 

distribution,  39 

function,  21 
Geffner,  J.,  16 
Generalized  interaction,  329 
Geometric  distribution.  38    79 
Gilbert,  S.,  276 
Gompertz  curve,  190 
Good,  CX  V.,  15 
Goode,  H,  H,,  15 
Goodness  of  fit  tests 

chi-square,  126 

in  regression  (lack  of  fit),  188 

Kolmogorov-Smirnov,  471 
Gosset,  W.  S.,  277 
Graeeo-Latin  squares,  410,  414 
Grandage,  A.  H.  E.,  188,  221 
Grant,^  E.  L_,  477,  484,  499 
Graphical  representation,  51    341 
Graybill,  F.  A.,  28,  221,  341,  361 
Greek  alphabet,  511 
Griffin,  1ST.,  498,  510 
Grouping,  251,  268 
Growth  curve  (see  Exponential) 
Gryna,  F.  M.,  Jr.,  15,  499,  51O 

H 

H.  hypothesis  (or  null  hypothesis).  1OS 

Hader,  H.  J.,  188,  221 

Hald,  A.,  43,  94,  105,  157,  523,  529 

Hamaker,  H.  CX,  339,  361 

Hartley,  H.  O.,  85,  86,  310,  361 

Hastay,  M.  W,,  1OO,  106,  158,  499 

Hattery,  L.  H,,  15 

Hawley,  G,  O.,  15 

Hay,  W.  A.,  435 

Hazard  rate,  43,  509 

Heterogeneity 

chi-square,  128 

of  variances,  339 
Heterogeneous  variances,  339 
Heteroschedasticity,  339 
Hill  way,  T.,  15 
Histogram,  frequency,  49 
Homogeneity  of  variances 

assumption  of,  95,  119,  133,  169,  197, 
282,  338 

test  for,  136 

Homogeneous  variances,  197 
Homoscedasticity,  197 
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Homotypic  correlation,  235 
Hopkins,  J.  W.,  457,  458,  465 
Horton,  W.  H.,  435 
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Hunter    J.  S.,  188,  189,  221,  276,  277 
434,436,502,510  ' 

Huntsberger,  D.  V.,  43,  105,  158 
Mutt,   F.  B.,  6,   15 
Hypergeometric  distribution,  38    73 

binomial  approximation,  74 

use  in  acceptance  sampling    486 

use  in  Fisher's  exact  test,  132 
Hypotheses 

alternative,  108 

composite,  109 

null,  108 

simple,  109 

tests  of,  107 

I 

Identities,  20 

Incomplete  block  design,  417 

Independence 

in  a  two-way  table,  129 

mutual,  31 

pair  wise,  31 

statistical,  31 
Independent 

events,  31 

variables,  159 
Index  of  summation,  19 
Indifference  quality,  4B6 
Individual  comparisons,  262,  303    376 
Inference,  statistical,  87,  107 
Inferences  about  populations,  87,   107 
Information,  relative,  300 
Inspection 

by  attributes,  485 

by  variables,  485 

sampling,  485 

Interaction,  definitions  of,  257 
Intersection,  17 
Interval 

class,  47 

confidence,  88 

estimate,  88 
Intervals  for,  confidence 

difference  of  two  means,  95 

mean,  90 

proportion,  94 

ratio  of  two  variances,  97 

standard  deviation,  93 

variance,  93 

Intraclass  correlation,  235 
Inverse  of  a  matrix,  24 
Investigations,  experimental,  244 
Iterative  procedure,  393 
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Jeffreys,  H.,  15,  276 
Jevons,  W.  S.,  15 
Johnson,  P.  CX,  15 
Joint 

distribution.,  34 

probability,  31 

probability  function,  34 

probability  density  function,  34 
Juran,  J.  NL,  499 

K 

Kemeny,  J.  G.,  28,  43 

Kempthorne,    O.,   221,   261,    264,   266, 

267,  276,  277,  335,  341,  361,  408, 

436,  465 
Kendall,  M.  G.,  193,  221,  233,  234,  243, 

475 

Keuls,  M.,  310,  361 
Kimball,  G.  E.,  16 
Klein,  M.,  43 
Kolmogorov-Smirnov  test 
discussion,  338,  471 
table  of  critical  values,  560 
Koopmans,  L.  H.,  486,  488,  498 
Kramer,  C.  Y.,  221,  310,  361 
Kruskal,  W.  H.,  475 


X,  parameter  of  Poisson  distribution,  38 
Lack  of  fit,  188 

test  for  (in  regression),  188 
Latin  square,  410 
Latin  square  design 

analysis  of  covariance,  449 

analysis  of  variance,  412 

assumptions,  411 

calculations,  411 

comparison   of  selected   treatments, 
414 

components  of  variance,  414 

degrees  of  freedom,  412 

examples  of,  410 

expected  mean  squares,  412    • 

individual  degrees  of  freedom,  414 

missing  observations,  412 

relative  efficiency,  414 

response  curves,  414 

subsampling,  414 

test  of  hypothesis,  412 
Law  of  large  numbers,  72 
Laws   of  probability,   31 
Least  significant  difference  (LSD),  310 
Least  squares,  method  of,  88,  161 
Leone,  F.  C.,  276 
Level  of  factor,  254 
Level  of  significance,  108 


Lewis,  W.,  147 

Lieberman,    G.   J.,   86,    100,    105,    113, 
114,  115,  121,  123,   124,  157,  221, 
243,  486,  498,  499,  504,  510 
Likelihood  function,  73,  89 
Limits 

confidence  (see  Confidence  limits) 

control  (see  Control  limits),  480 

decile,  57 

natural  tolerance,  98,  502 

percentile,  57 

quartile,  57 

of  summation,  19 

specification,  502 

tolerance,  98,  502 
Linear 

combination  of  random  variables,  80 

correlation,  coefficient  of,  224 

equations,  24 

model,  164 

regression,  164 

relationship,  164 

statistical  model,  164 
Linearity 

assumption  of,  164 

of  regression,  164 

test  for,  188 
Link,  R.  P.,  510,  561 
Lipow,  M,,  499,  510 
Livermore,  P.  E.,  105,  157 
Lloyd,  D.  K.,  499,  510 
Local  control,  246,  250 
Location,  measure  of,  55 
Logarithmic  transformation,  340 
Logic,  3 

Logistic  curve,  190 
Lord,  E.,  510,  561 
Lot 

acceptance     procedures     based      on 
sampling  from,  485 

sampling  from,  485 

tolerance  percent  defective  (LTPD) 

487 

LSD,  least  significant  difference,  310 
LTPD,  lot  tolerance  percent  defective, 

487 

Lush,  J.  L.,  6,  15 
Luszki,  M.  E.  B.,  15 

M 

M,  sample  median,  55 
MO,  sample  mode,  58 
MR,  sample  midrange,  55 
At,  population  mean,  36 
Machol,  K.  E.,  15 
Main 

effect,  257 

plot,  415 
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Mandelson,  J.,  271,  276 
Mann,  H.  B.,  475 
Marginal 

distribution,  34 

probability,  35 

probability  density  function,  35 
Massey,  F.  J.,  105,  157,  221,  243,  338, 

361,  408,  475,  529,  556,  560 
Mathematical 

concepts,  17 

expectation,  33 

model,  163 
Matrices 

addition  of,  23 

multiplication  of,  23 
Matrix 

definition  of,  22 

determinant  of,  23 

dimension  of,  22 

diagonal,  23 

identity,  23 

inverse  of,  24 

nonsingular,  24 

null,  23 

square,  23 

symmetric,  23 

transpose  of,  23 
Maxfield,  M.  W.,  498 
Maximum  likelihood,  88 
McAfee,  N.  J.,  15,  499,  510 
McNemar,  Q.,  231,  232,  233,  243 
Mean 

adjusted,  442,  446,  450 

arithmetic,  53 

confidence  interval  for,  91,  92 

confidence  limits  for,  91 

correction  for,  165 

definition  of,  53 

deviation  from,  53 

difference,  96,  121 

distribution  of,  80 

estimation  of,  70,  90 

of  probability  distribution   (see  Ex 
pected  value) 

population  (/*),  36 

sample  (jy),  53 

square,  134 

standard  error  of,  91 

tests  concerning,  113 

time  between  failure   (MTBF),   509 
Means 

differences  among  several,   133,   279 

difference  between  two,  95,  119,  288 
Measure 

of  location,  55 

of  position,  55 

of  precision,  60 
Median 
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definition,  36 

sample  (M),  55 

tests,  473 

Merrington,  M.,  529 
Method  of  least  squares,  88    161 
Midrange  (MR),  55 
Military  Standards 

105C,  499 

414,  499 

Miller,  D.  W.,  16 
Miller,  I.,  105,  157 
Minimum  variance  estimator,  87 
Missing  data  or  plot 

Latin  square,  412 

randomized  complete  block,  390 
Mitscherlich  curve,  190 
Mixed  model,  319,  325 
MO  (mode),  58 
Mode 

absolute,  58 

definition,  36 

relative,  58 

sample  (MO),  58 
Model 

analysis  of  variance,  282,  318,  324 

component  of  variance,  282,  318,  324 

for   completely   randomized    desien 
281  ' 

for  covariance 

completely      randomized      design 

438 

Latin  square  design,  438 
randomized    complete    block,    de 
sign,   438 

two-factor  factorial  in  an  R.CB  de 
sign,  438 

for  factorials,  316 

for  Latin  square,  411 

for  randomized  complete  block,  364 

for  regression,  163 

linear,  281 

mathematical,  163 

mixed,  319,  325 

statistical,  163 
Molina,  E.  C,,  512 
Moments 

definition,  36 

of  population,  36 

of  sample,  7O 
Mood,  A.  M.,  43,  78,  86,  105,  131,  158, 

276,  473,  475 
Morse,  P.  M.,  16 
Moses,  L>  E.,  475 
Mosteller,  F.,  43,  499 
MR  (midrange)  55 
MTBF    (mean   time   between   failure) 

509 
Muench,  J.  O.,  94,  105 
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Multino  mial 

distribution,  78 

population,  78 

tests  of  hypotheses,  124 
Multiple 

correlation,  227 

covariance,  457 

regression,  178 

sampling,  485,  490 
Multiplication 

of  matrices,  23 

of  probabilities,  31 

Multiplicative  law  of  probability,  31 
Murphy,  R.  B.,  101,  105 
Mutual  independence,  31 

N 

N,  size  of  lot  or  population  or  stratum, 

71,  73,  485 
N  (/*>  <0  >  normally  distributed  -with  mean 

fjt,  and  standard  deviation  o>  80 
nt  sample  size,  46 

no,  average  group  size  in  ANOVA,  283 
v,  degrees  of  freedom,  81 
v,  parameter  in  x2-<iiatribution,  81 
v,  parameter  in  ^-distribution,  83 
v,  parameter  in  ^-distribution,  83 
Nagel,  E.,  15 

National  Applied  Mathematics  Labo 
ratories,  245 
National  Bureau  of  Standards,  43,  74, 

86,  276,  436 
Natural  tolerance  limits,  98,  502 

relation  to  specification  limits,  502 
Near  stationary  region,  424 
Negative  binomial  distribution,  38,  79 
Negative  correlation,  224 
Newman,  D.,  310,  361 
Nominal  value,  502 
Noncentral  ^-distribution,  113 
Non-linear 

functions  of  random  variables,  505 
regression,  190 
Nonparametric 
methods,  466 
tests 

goodness-of-fit  test,  126 
Kolmogorov-Smirnov  test,  471 
median  tests,  473 
run  tests,  470 
sign  test,  466 
signed  rank  test,  468 
Wilcoxon  signed  rank  test,  468 
Normal  approximation 
to  binomial,  76,  116 
to  distribution  of  sample  means,  72 
Normal  distribution 

approximation  to  binomial,  76,   116 


approximation     to     distribution     of 
sample  means,  72 

areas,  table  of,  517 

bivariate,   198,  225 

central  limit  theorem,  72 

density  function  of,  39 

expected  value  of,  39 

equation  of,  39 

mean  of,  39 

parameters  of,  39 

percentage  points  of,  517 

reproductive  property,  80 

standard,  39 

standard  deviation  of,  39 

standardized,  39 

table  of  areas,  517 

tolerance  limits,  98 

variance  of,  39 
Normal  equations,  164 
Normal  population,  bivariate,  198,  225 
Normal  random  variables 

linear  combinations  of 
distribution  of,  80 
mean  of,  80 
variance  of,  80 
Normality 

assumption  of,  338 

test  for,  338 

Normalizing  transformations,  340 
Notation,  18 
Nottingham,  R.  B.,  276 
Null 

hypothesis,  109 

set,  17 
Numbers 

disproportionate,  423 

equal,  280 

expected,  117,  124,  127,  130 

proportionate,  421 

random 

table  of,  544 
use  of,  46 

unequal,  279 


Observations,  paired,  96,  121,  368 
Objectionable  quality  level  (OQL),  487 
OC  curve  (see  Operating  characteristic 

curve),  110,  487 

OC  function  (see  Operating  character 
istic  function),   108,   485 
Ogive  curve,  52 
Olds,  E.  G.,  234,  243 
Olmstead,  P.  S.,  475 
Operating  characteristic  curve 

in  acceptance  sampling,  487 

in  tests  of  hypotheses,  110 
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Operating  characteristic  function,  108, 
485 

definition  of,  108 

graph   of    (see   Operating  character 
istic  curve),  110,  487 
OQL,  objectionable  quality  level,  487 
Order  statistics,  84 
Orthogonal 

comparisons,  263,  306,  376 

contrasts,  263,  306,  376 

polynomial,  192,  313 

polynomial  coefficients,  194,  314 
Ostle,  B.,  15,  16,  101,  105 
Outcome  of  an  experiment,  30 
Owen,    D.   B.,   86,    100,    101,    105,    106, 
486,  499 


P  (...),  probability,  29 

p  chart 

center  line,  480 
control  limits,  480 
example  of,  481 
interpretation  of,  481 

n,  product  sign,  20 

Paired  observations,  96,  121,  368 

Pairing,  96,  121,  239,  368 

Pair  wise  independence,  31 

Parabolic  regression,  191 

Parameters  of 

beta  distribution,  39 
binomial  distribution,  38,  74 
chi-square  distribution,  40,  81 
exponential  distribution,  39 
F  distribution,  40,  83 
gamma  distribution,  39 
geometric  distribution,  38,  79 
hypergeo metric  distribution,  38,   73 
multinomial  distribution,  78 
negative    binomial   distribution,    38, 

79 

normal  distribution,  39 
Poisson  distribution,  38 
standard  normal  distribution,  39 
t  distribution,  40,  83 
,  Weibull  distribution,  39 

Partial 

correlation,  228 
regression  coefficient,  229 

Path  of  steepest  ascent,  424 

Paull,  A.  E.,  372,  373,  408 

Peach,  P.,  276 

Pearson,  E.  S.,  85,  86,  94,  105,  276,  561 

Pearson,  K.,  231,  232,  243,  517 

Percentage  points 

for  chi-square  distribution,  523 

for  F  distribution,  529 

for  normal  distribution,  517 


for  Poisson  distribution,  512 

for    standard    normal    distribution, 

517 

for  t  distribution,  528 
Percentages,  estimation  and  testing  of 
(see    Binomial    and    Multinomial 
distributions) 
Percentile,  57 
Percentile  limits,  57 
Permutations,  20 
Plackett,  R,  L.,  436 
Planning  of  experiments,  244 
Plot,  split  or  sub,  415 
Plot,  missing,  390,  412 
Point  estimate,  87 
Poisson  data,  124 
distribution,  37,  38 

approximation  to  binomial,  75 

equation  for,  38 

expected  value  of,  38 

mean  of,  38 

probability  function,  38 

relation  to  c-chart,  483 

standard  deviation  of,  38 

table  of  cumulative  probabilities, 

512 

tests  of  hypotheses,  124 
variance  of,  38 
Polygon,  frequency,  49 
Polynomial 

orthogonal,  192,  313 
regression,  192 
second  degree,  191 
Pomerans,  A.  J.  (see  Taton,  R.),  16 
Pooled 

chi-square,  128 
estimate  of  variance,  95 
Popper,  K.  R.,  16 
Population 

binomial,  38,  73,  74 
bivariate  normal,  198,  225 
correlation  coefficient,  36,  225 
definition,  44 
finite,  71 
mean,  36 
median,  36 
mode,  36 
moments,  36 
multinomial,  78 
normal,  39,  80 
Poisson,  38,  124 

product     moment     correlation     co 
efficient,    36 
standard  deviation,  36 
standard  normal,  39 
variance,  36 
Position,  measure  of,  55 
Positive  correlation,  224 
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Power 

function,  108 
of  a  test,  108 
Prater,  N.  H.,  184,  221 
Precision 

definition  of,  104 
measure  of,  60 

Predicted  (or  estimated)  value,  163 
Predicting   (estimating)   equation,   163 
Prediction     (or    regression)     equation, 

163 

Prediction  interval,  172 
Preliminary  tests  of  significance,  371 
Presentation  of  results,  341 
Principle  of  least  squares,  161 
Principles  of  experimental  design,  246 
Probabilities,  binomial,  32 
Probability 

additive  law,  31 

classical  definition,  29 

compound,  31 

concepts,  30 

conditional,  31,  35 

continuous  distribution,  33 

cumulative      distribution      function 

(c.d.f.),  33 
definition  of,  29 
density  function  (p.d.f.),  33 
distribution,  33 
discrete  distribution,  33 
estimates  of,  48 
function  (p.f.)»  33 
independence  (definition),  31 
joint,  34 

laws  of  (general),  31 
marginal,  34 
multiplicative  law,  31 
notation,  29 
operations  with,  31 
properties  (general),  30 
relative  frequency  definition,  30 
total,  31 

Tchebycheffs     inequality     to     esti 
mate,  71 

unconditional  (see  Marginal),  34 
Probability  distribution 

associated  with  a  random  variable, 

33 
continuous 

cumulative  distribution,  33 
expected  value  of,  35 
mean  of,  36 
moments  of 

about  mean,  36 
about  origin,  36 
standard  deviation  of,  36 
variance  of,  36 
definition  of,  33 


discrete 

cumulative  distribution,  33 
expected  value  of,  35 
mean  of,  36 
moments  of 

about  mean,  36 
about  origin,  36 
standard  deviation  of,  36 
variance  of,  36 
Procedures,  test,  111 
Process,  control  of,  477 
Producer's  risk,  487 
Product  moment,  36 

Product  moment  correlation  coef 
ficient,  36 

in  the  population,  36 
in  the  sample,  225 
Product  sign  (H),  20 
Properties  of 

arithmetic  mean.,  53 
coefficient  of  variation,  64 
estimators,  87 
mean,  53 
median,  56 
midrange,  55 
mode,  58 
probability,  30 
range,  60 

standard  deviation,  60 
variance,  60 

Proportion,  estimation  and  tests  of  (see 
Binomial  and  Multinomial  dis 
tributions) 

Proportion  defective,  94,  115,  481 
Proportionate  subclass  numbers,  421 
Purcell,  W.  R.,  276 


Quality,  477 
Quality  control,  477 
Quality  control  charts,  477 

c-chart,  479,  483 

examples  of,  479 

nature  of,  477 

Z?-chart,  479,  481 

.ft-chart,  479,  481 

5r-chart,  479,  481 
Quartile  limits,  57 

Quenouille,  M.  H.,  261,  277,  361,  409, 
436,  465 


R  chart 

center  line,  480 
control  limits,  480 
example  of,  483 
interpretation  of,  483 
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R,   coefficient   of   multiple   correlation, 

228 

R,  sample  range,  60 
R2,  correlation  index,  223 
r,  linear  correlation  coefficient,  224 
r,  sample  correlation  coefficient,  224 
r,  statistic  in  the  run  test,  470 
table  of  critical  values,  558 
r,  statistic  in  the  sign  test,  467 

table  of  critical  values,  556 
n,    sample    intraclass    correlation    co 
efficient,  236 

rsj    Spearman's    rank    correlation    co 
efficient,  233 

7*12.3,  partial  correlation  coefficient,  228 
rXc  table,  129 

p,   population   product-moment   corre 
lation  coefficient,  36 
PI,     population    intraclass    correlation 

coefficient,  236 
Random 

balance,  425 
numbers,  46 

table  of,  544 
Random  sample,  45 
Random  sampling  distributions,  70 
chi-square,  81 

difference  between  two  means,  81 
F,  83 

linear  combination  of   random  vari 
ables,  80 
mean,  80 

sample  proportion,  74 
standard  deviation,  83 
t,  83 

variance,  83 

Random  sampling  numbers,  46 
Random  variable,  definition  of,  30 
Randomization 
concept,  246 

use  in  experimental  design,  249 
Randomized  blocks,  363 
Randomized    complete    block    design, 

268,  363 

analysis  of  covariance,  444 
analysis  of  variance,  364 
assumptions,  364 
calculations,  364 
comparison  of  selected  treatments, 

376 

components  of  variance,  373 
definition,  363 
degrees  of  freedom,  365 
efficiency,  375 
expected  mean  sqxiares,  365 
factorial  treatments,  380 
individual  degrees  of  freedom,   376, 
380 


missing  observations,  390 

relative  efficiency,  373 

response  curves,  380 

standard  error  of  treatment   mean, 

366 
subdivision    of    experimental    error, 

376 

subsampling,  368 
^-test  (paired  observations),  368 
tests  of  hypotheses,  366 
variance  of  treatment  mean,  366 
Range,  47 

charts,  480,  481 

definition  of,  60 

distribution  of,  85 

sample,  60,  84 

tests  using,  500 

use  as  an  estimator  of  the  standard 

deviation,  63 

Rank  correlation,  233,  466 
Rao,  C.  R.,  436 
Ratio,  correlation,  229 
Ratner,  R.  A.,  254,  277 
Reduction  in  sum  of  squares,  187 
Region 

acceptance,  108 
critical,  109 
rejection,  109 
Regression 

abbreviated  Doolittle  method,  177 
analysis,  159 
analysis  of  variance,  166 
assumptions,  168 
average  within  groups,  202 
both  variables  subject  to  error,  199 
cautions  about,  160 
coefficient,  164 
in  comparisons,  312 
and  correlation,  223 
curvilinear,  190 
definition,  159 
deviations  from,  162 
in  each  group,  201 
equation,  163 
curvilinear,  190 
linear,  164 
multiple,  178 
polynomial 

orthogonal,  192 
second  degree,  191 
estimates,  170 
exponential,  190,  194 
general  remarks,  186 
Gompertz,  190 
graphical  interpretation,  162 
of  group  means,  202 
inverse  prediction,  176 
lack  of  fit,  188 
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Regression  (continued) 

least  squares,  161 

linear,  164 

logistic,  190 

methods,  159 

Mitscherlich,  190 

multiple,  178 

multiple  linear,  178 

nonlinear,  190 

normal  equations,  164 

orthogonal  polynomials,  192 

parabolic,  191 

partitioning  sum  of  squares,  164 

polynomial,  190 

pooled  coefficient,  202 

quadratic,  191 

reduction  in  sum  of  squares,  187 

relation    to    analysis    of    covariance, 
438 

relation  to  analysis  of  variance,  340 

several  samples,  201 

second  degree,  191 

second  order  models,  191 

simple  linear,  164 

tests  of  hypotheses,  174 

uses,  205 

weighted,  197 

Rejectable  quality  level  (RQLf),  487 
Rejection  region,  109 
Relative 

efficiency,  300 

frequency,  30,  48 

definition  of  probability,  30 

information,  300 

standard  deviation,  64 
Relevant  factors  in  an  experiment,  245 
Reliability,  105,  477,  506 
Replicate,  381 
Replication,  128,  246,  270 
Repetition,  246 
Research,  1 

Resource  economy,  245 
Response 

curve,  312 

function,  160 

surface  techniques,  424 

variable,  159 
Riordan,  J.,  28 
Risk 

consumer's,  487 

producer's,  487 

Robertson,   W.  H.,  74,  86,  133,   158 
Romig,  H.  G.,  43,  74,  86,  498 
Rourke,  R.  E.  K.,  43 
Roy,  R.  H.,  15 

RQL,  rejectable  quality  level,  487 
Run  test 

discussion,  470 


table  of  critical  values,  558 
Ryerson,  C.  M.,  15,  499,  510 


s,  sample  standard  deviation,  61 
szy  sample  variance,  60 
#61?    standard    error    of    regression    co 
efficient,  170 

SE,  standard  error  of  estimate,  170 
sx,  standard  error  of  sample  mean,  91 
Sxi-Xz,  standard  error  of  difference  be 
tween  two  sample  means,  95 
Sr,  standard  error  of  predicted  value  of 
dependent  variable  in  a  regression 
analysis,  170 
T"!,  summation  sign,  19 
cr,  population  standard  deviation,  36 
<**}  population  variance,  36 
<TXY,  population  covariance,  36 
Saaty,  T.  L.,  14,  16 
Sample 

coefficient  of  variation,  64 

correlation  coefficient,  224 

definition  of,  45 

drawing  of,  46 

judgment,  45 

mean,  53 

median,  55 

midrange,  55 

mode,  58 

moments,  70 

obtaining  of,  46 

point,  29 

probability,  45 

random,  45 
simple,  45 
stratified,  46 
systematic,  46 

range,  60,  84 

regression  coefficient,  164 

simple  random,  45 

size,  136 

space,  29 

standard  deviation,  61 

stratified  random,  46 

systematic  random,  46 

variance,  60 
Sampling 

acceptance,  477,  485 
by  attributes,  485 
by  variables,  485 

distribution,  70 

double,  488 

elements  of,  44 

error,  291 

inspection,  485 

judgment,  45 

multiple,  490 
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multistage,  297 

plans,  477 

probability,  45 

random,  45,  73 

sequential,  490 

simple,  45 

single,  485 

stratified  random,  46 

systematic  random,  46 

without  replacement,  46,  73 

with  replacement,  46,  74 
Sasieni,  M.,  16 
Satterthwaite,    F.    E.,    277,    302,    329, 

361,  425,  436 
Savage,  I.  R.,  475 
Scates,  D.  E.,  15 
Scatter  diagram,  162 
Scattergram,  162 
Scheffe,  H.,  310,  311,  361,  475 
Schenck,  J.,  Jr.,  16 
Schneider,  A.  M.,  435 
Science,  2 

Scientific  method,  2,  4 
Second  kind  of  error,  107 
Seder,  L,  A.,  499 

Selected  comparisons,  261,  303,  376 
Sequential 

fitting  of  constants,  179 

probability  ratio  test,  140 

sampling,  485,  490 

tests  of  hypotheses,  140 
Series,  20 
Set 

complementary,  17 

definition  of,  17 

null,  17 

number  of  elements  in,  18 

of  all  possible  outcomes  of  an  experi 
ment,  30,  264 

theory,  17 

universal,  17 
Shainin,  D.,  277 
Shewhart,  W.  A.,  477,  499 
Siegal,  S.,  475 
Sigma 

53,  summation  sign,  19 

<r,  population  standard  deviation,  36 
Sign  test 

discussion,  466 

table  of  critical  values,  556 
Signed  rank  test 

discussion,  468 

table  of  critical  values,  557 
Significance  level,  108 
Significance  test  (see  Hypothesis,  test 

of),  107 
Significance  tests 

acceptance  regions,  108 


chi-square  tests,  114,  117,  125,  126, 

129,  137 
for  coefficients  of  fitted  straight  line, 

174 

for  comparison  of  two  means,  119 
in  contingency  tables,  129 
critical  regions,  109 
distribution-free,  466 
for  equality  of  means  when   obser 
vations  are  paired,  121 
for  equality  of  two  percentages,  132 
F  test  (see  also  ANOVA  and  Regres 
sion),  123 

goodness  of  fit,   126,  471 
for  independence,  129 
Kolmogorov-Smirnov,  471 
that  mean  of  normal  population  has 

specified  value 
one-sided,  113 
two-sided,  113 
about  mean  of  Poisson  distribution, 

124 

that  means  of  two   normal  popula 
tions  are  equal 
one-sided,  122 
two-sided,  119 

that  means  of  k  >  2  normal  popula 
tions  are  equal,  133 
median  tests,  473 
nonparametric  tests,  466 
preliminary,  371 
procedures  (general),  107 
about     proportion     or     proportions, 

115,  132 

ratio  of  two  variances,  123 
in  regression,  174,  192 
rejection  regions,  109 
run  test,  470 
sample  size,  136 
sign  test,  466 
signed  rank  test,  468 
t  test,  113,  119,  121,  174 
that  variance  of  normal  population 

has  specified  value 
one-sided,  115 
two  sided,  114 

that  variances  of  k>2  normal  popu 
lations  are  equal,  136 
variance  ratio  test,  123 
Wilcoxon  signed  rank  test,  468 
Simon,  L.  E.,  499 
Simple  random  sample,  45 
Single  sampling,  485 
Sinkbaek,  S.  A.,  523 
Size  of  sample,  136 
Slope  (of  straight  line),  174 
Smirnov,  N.  V.,  475 
Smith,  B.  Babington,  233,  243 
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Smith,  H.  Fairfield,  465 
Smith,  K.,  475 

Snedecor,  G.  W.,  16,  94,  106,  158,  220, 
221,  243,  277,  339,  362,  409,  436, 
544 

Snell,  J.  L.,  28,  43 
Space,  17,  29 
Spearman,  C.,  233 

Spearman's     rank     correlation     coeffi 
cient,  233 
Specification  limits 
definition,  502 
relation      to      statistical      tolerance 

limits,  502 
Split  plot  (subplot) 
definition  of,  415 
design,  271,  415 

square  root  transformation,  340 
Squares,  least,  88,  161 
Standard  deviation 

of  beta  distribution,  39 

of  binomial  distribution,  38,  74 

of  chi-square  distribution,  40 

confidence  interval  for,  93 

definition,  36 

distribution  of  sample,  83 

estimated  by  sample  range,  63 

of  exponential  distribution,  39 

of  F  distribution,  40 

of  gamma  distribution,  39 

of  geometric  distribution,  38 

of  hyper  geometric   distribution,   38, 

74 

of  linear  function   of   random  vari 
ables,  80 
of  mean,  80,  91 

of  negative  binomial  distribution,  38 
of  normal  distribution,  39 
of  Poisson  distribution,  38 
of  population,  36 
of  sample,  61 
significance  test  for,  114 
of  standard  normal  distribution,  39 
of  t  distribution,  40 
of  Weibull  distribution,  39 
Standard  error  of 
comparison,  309 
contrast,  309 

difference  between  two  means,  95 
estimate,  170 
mean,  91 

predicted  value  in  regression,  170 
regression  coefficient,  170 
sample  mean,  91 
treatment  comparison,  309 
treatment  mean,  285,  366 
Y,  170 
Standard  normal  distribution,  37 


definition,  39 
table  of  areas,  517 
Starr,  M.  K.,  16 
Statistic,  test,  109 
Statistical 
control,  >78 

design  of  experiments,  244 
estimation,  87 
independence,  31 
inference,  87,  107 
model,  163 
quality  control,  477 
regularity,  30 
tests  of  hypotheses,  107 
tolerance  limits,  98 
Statistical  Research  Group,  Columbia 

University,  106,  158,  499 
Statistics 

definition  (as  a  science),  3 
definition  (plural  of  statistic),  2 
descriptive,  44 
order,  84 

relation  to  probability,  17 
relation  to  research,  3 
relation  to  scientific  method,  3 
role  in  research,  1 
scope  of,  f2 

some  fields  of  application,  5 
Steepest  ascent 
direction  of,  424 
path  of,  424 

Steps  in  designing  experiments,  264 
Straight  lines   (see  Regression),    164 
Strata,  46 
Stratification,  46 
Stratified  random  sample,  46 
Stratum,  46 

"Student"  (W.  B.  Gosset),  277 
"Student's"  t  (see  t  distribution),   37, 

40,  83 
Subclass  numbers 

disproportionate,  423 
proportionate,  421 
Subplot,  415 
Subsample,  288 
Subsampling,  288 
Subset,  18 
Sum 

algebraic,  19 
definition,  19 

of  products,  corrected,  164,  441 
of  squares,  corrected,  165 
of  matrices,  23 
of  products,  164 
of  squares,  162 
Summation 
index,  19 
notation,  19 
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pseudo    2-statistics    (Appendix    17), 

561 

r  in  run  test  (Appendix  15),  558 
r  in  sign  test  (Appendix  13),  556 
random  numbers  (Appendix  7),  544 


size   of    sample    (Appendices    9-12), 
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chi-square,  128 
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of  chi-square  distribution,  40 

components  of,  237,  299 

confidence  limits  for,  93 

of  contrast,  309 

definition  of,  36 

distribution  of  sample,  83 

estimation  of,  93 

of  exponential  distribution,  39 

of  F  distribution,  40 

of  gamma  distribution,  39 

of  geometric  distribution,  38 

homogeneity  of,  136,  338 
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